HiDisc: A Decoupled Architecture for the future

Abstract

We present an architecture that distributes processors throughout the memory hierarchy. The architecture tolerates memory latency, provides simple instruction-level parallelism, and performs well with small cache block sizes (to reduce memory traffic). The processors’ independent instruction streams produce instruction-level parallelism while reducing the interactions between instructions, allowing a reduction in global synchronization and support hardware and increased memory latency tolerance. Further instruction-level parallelism can still be exploited at each level. A three processor version of the architecture produces speedups comparable to current superscalar processors for scientific benchmarks. The relative merits of this and other architectures in light of technology trends that will allow a billion transistors on a chip are discussed.

Date: February 1, 1997
Authors: Stephen P Crago, Alvin M Despain
Journal: University of Southern California

View Paper