Harder Working Transistors Through Automated FPGA Compiling

February 21, 2003

What if hardware that was optimized for video decompression to play DVDs could be re-optimized, easily and almost instantaneously, to do high-security encryption of personal data for the internet?

On Feb. 25, USC computer scientist Pedro Diniz presented his most recent benchmark in progress toward that goal at a computing conference in Northern California.

"We are developing programming tools to automatically synthesize near-optimal chip architectures tuned for each application," he said, computing systems in which software is automatically optimized to flexible chips.

Diniz, a research assistant professor at the USC School of Engineering's Information Sciences Institute, has been combining the abilities of a new computing platforms called Field Programmable Gate Arrays (FPGAs) with novel, sophisticated compiling techniques.

Diniz says that with these techniques FPGAs make much more effective use of the available transistors for very irregular applications that perform poorly on current processor architectures.

Specifically, in a presentation at the FPGA 2003 Conference in Monterey, California, Diniz and collaborator Joonseok Park described adapting standard software for a task called "simple spatial queries over spatial sparse-mesh and quad- tree data structures" for an FPGA using their techniques.

The result: their low-clockrate and elderly (3-year-old) FPGA platform matched the performance of a top of the line, new- generation workstation.

"This experience suggests that the integration in memory of FPGA-like fabrics for implementing smart memory engines should be performance-wise very advantageous," Diniz wrote.

Diniz believes that reconfigurable logic design techniques of the type he described at the conference promise to allow future generation of processors to use their power more effeciently.

"In today's processor designs transistor are locked-down for specific functions. Using reconfigurable techniques one can envision future processor architecture to morph into distinct topologies, suitable for each application at hand."

While transistors are cheap, he noted, power consumption and usage is increasingly an issue, particularly for portable devices.

The promise of FPGAs has been slow to materialize because of the lack of software tools that facilitate the mapping and synthesis of custom architectures, according to Diniz. "So, creation of such software tools is what we have been working toward,"

Diniz is working with ISI researcher Mary Hall on programming next-generation of FPGAs, so-called "System on a Chip" devices with still more flexibility and potential power.

One such system has actually been realized by an ISI team including Hall and Diniz. That chip, a "Processor in Memory" chip called DIVA, is now being investigated under a grant from DARPA for integration into a HP system.

"The advantages of such architecture depend upon being able to use established applications without drastic re- engineering," says Diniz. "We believe our experience so far shows this will be possible."

Diniz and Parks research on the FPGA project presented at FPGA 2003 was financed by DARPA. NSF has approved a grant for a new project, called SLATE that will carry this work forward.

FPGA chip used by USC/ISI researchers

Abstract of the Diniz/Park presentation FPGAs have appealing features such as customizable internal and external bandwidth and the ability to exploit vast amounts of fine-grain instruction-level parallelism. In this paper we explore the applicability of these features in using FPGAs as data search and reorganization engines for performing search and reorganization computations over spatial pointer-based data structures for which traditional computing platforms perform poorly. The preliminary experiments, for a set of simple spatial queries over spatial sparse-mesh and quad-tree data structures, reveal that 3 year-old FPGA devices can deliver performance that is on par and in some instances even superior to that of today's workstations. This experience suggests that the integration in memory of FPGA-like fabrics for implementing smart memory engines should be performance-wise very advantageous.