University of Southern California

DIVA Chip Breaks Through Memory Wall

July 29, 2002

USC/ISI Device Will Be Evaluated for Use in HP Platform with DARPA funding. Applications Seen in Multimedia, Complex Scientific Modeling and Database Access.

Researchers from the University of Southern California School of Engineering's Information Sciences Institute (ISI) demonstrated a Processor in Memory (PIM) chip, or "Smart Memory" chip that has the capability to speed some calculations by at least an order of magnitude at the DarpaTech 2002 Symposium in Anaheim, CA. July 31.

At the demonstration, a representative of DARPA's High Productivity Computing Systems (HPCS) initiative announced that HP will collaborate with ISI to evaluate the chip for use in the company's McKinley server.

The DIVA chip was prototyped by MOSIS, ISI's chip brokerage service. ISI computer scientist John Granacki, co-leader of the Data IntensiVe Architecture (DIVA) project that created the new chip, said it addresses a longstanding and growing mismatch in computer components.

While central processing units (CPUs) are running ever faster, much of the data that these chips process come from separate random access memory (RAM) chips and the connection between the two has become a bottleneck that restricts performance.

For many years, computer scientists have experimented with combining CPU and RAM functions on a single piece of silicon. "RAM chips are finally dense enough that we can afford the space for processor logic on them," explained DIVA co-leader Mary Hall. "Because the processors and memory are so much closer and are on the same chip, this design not only cuts the time delay per computation but also increases the potential bandwidth for data transfer between them."

"The DIVA PIM chip is not the first device that has a processor-in-memory functionality," said Hall. "Computer scientists have been talking about the potential of PIM chips for most of the past decade and have released devices they call PIM chips, but this is the first smart-memory device designed to support virtual addressing and capable of executing multiple threads of control."

Granacki said that other PIM devices have had "a strict, unchangeable protocol that limits their usefulness."

Existing CPUs do have a small amount of memory, called a cache, built in. For many applications, this substantially speeds processing. But for many others, cache capacity isn't sufficient, and the processor must wait while data is sought and retrieved from separate memory chips.

Delay caused by this process is called the "memory wall." The DIVA hardware design removes it by vastly expanding cache memory. The chip can take much better advantage of software that processes 256 bits of information, rather than the standard 32, in each operation cycle.

Groups of DIVA chips can serve both as a parallel processor, performing most of the program calculations internally, and as a set of "smart coprocessors." Instead of dragging each piece of stored data to the central processor for computation, the PIM chip passes each computation to the processor unit or node that is nearest to the data it needs, said Hall.

Tests are proceeding to gauge how completely the new chip realizes the hopes of its designers. The researchers said the new chips have executed some benchmark tests more than 10 times faster than conventional systems. The team believes they have the potential for speedups of as much as several hundredfold.

The researchers said the PIM chip contains 55-million transistors and is one of the largest functioning chips to result from academic research. Use of the MOSIS brokerage made it possible to produce the ambitious architecture economically. The DIVA team hopes to produce a full prototype system with chip groupings by 2003. However, the unit to be demonstrated at the conference represents a substantial step in that direction, a two DIVA chip unified module.

The just-announced HPSC effort will attempt to insert 16 or 32 PIM units into the HP McKinley machine, according to Jeff Draper, who led the VSLI section of the DIVA team. Researchers expect that the first machine should be ready for testing in 12 to 18 months.

ISI cooperated on the development of the system architecture with researchers at the University of Notre Dame, Caltech, the University of Delaware, and Alphatech Inc., of Burlington MA.

Besides Granacki, Hall and Draper major contributions came from ISI researchers Jacqueline Chame (Simulation, Benchmarking and Compiler), Jeff LaCoss (Emulator), Tim Barrett (System Integration), Jeff Sondeen (VLSI), and Dale Chase (Emulator and System Integration) and Craig Steele. Many USC graduate students also contributed to this project.

The DIVA effort was funded by the Defense Advanced Research Projects Agency, host of the conference.

Part of the USC School of Engineering, the USC Information Sciences Institute is one of the nation's leading computer science research and development centers, with a broad research program in artificial intelligence, networking and communications, software generation, integrated circuits and microdevice fabrication, parallel and distributed computing. ISI, with a staff of 350, has two research locations, the main facility in Marina del Rey, California, and an East Coast facility in Arlington, Virginia.