Renesas builds massively parallel processor

February 9, 2006 – Renesas Technology Corp. has developed a massively parallel processor based on a matrix architecture, designed specifically to handle image and audio multimedia data processing tasks.

The device features tight coupling of 2048 processing elements and 1Mbit SRAM, and has been confirmed to achieve 40 GOPS (giga operations/sec) performance at a 200MHz clock frequency.

Improving processor performance to deal with advances in multimedia applications such as increasing pixel counts and demand for programmable devices generally has meant a tradeoff involving increasing operating frequency using finer semiconductor processes, while maintaining low power consumption and high performance. Recent architecture advances such as a MIMD (multiple instruction multiple data) processor increases performance but also has issues with reducing power consumption.

To solve these issues, Renesas has developed a matrix type processor based on a different memory technology from that of a DSP or MIMD type processor: a fine-grained SIMD (single instruction multiple data) type massively parallel programmable device, with 2-bit processing elements (PE) and 512-bit SRAM assigned as data registers. It includes 2048 processing elements (PEs) and a total of 1 Mbit SRAM, together with tight coupling between PEs.

Reductions in area and power consumption were achieved by utilizing horizontal and vertical channel interconnecting PEs, with a special technique in the 2-bit PE circuit configuration called a valid flag to select execution via an H channel or V channel data transfer (or a PE operation itself), allowing for a conditional jump to be performed every clock cycle, speeding up butterfly computations. Also, the single-port SRAM device uses a 3-port data register — SRAM with 2 banks, with a memory “read-modify-write” operation overwriting output data simultaneously on the data used in a read.

The resulting prototype processor, implemented in 90nm CMOS with core area of 3.1 sq. mm, achieved processing performance of 40 GOPS at a 200MHz clock frequency and 250mW power dissipation, approximately 70x and 13x better energy efficiency in terms of unit area ratio and unit power, respectively, compared with a conventional in-house DSP.


Easily post a comment below using your Linkedin, Twitter, Google or Facebook account. Comments won't automatically be posted to your social media accounts unless you select to share.