Publications
HiDisc: A Decoupled Architecture for the future
Abstract
We present an architecture that distributes processors throughout the memory hierarchy. The architecture tolerates memory latency, provides simple instruction-level parallelism, and performs well with small cache block sizes (to reduce memory traffic). The processors’ independent instruction streams produce instruction-level parallelism while reducing the interactions between instructions, allowing a reduction in global synchronization and support hardware and increased memory latency tolerance. Further instruction-level parallelism can still be exploited at each level. A three processor version of the architecture produces speedups comparable to current superscalar processors for scientific benchmarks. The relative merits of this and other architectures in light of technology trends that will allow a billion transistors on a chip are discussed.
- Date
- February 1, 1997
- Authors
- Stephen P Crago, Alvin M Despain
- Journal
- University of Southern California