Publications
Improving the performance of loop-based programs using a prefetch processor
Abstract
We present an architecture called the CAPP (Computing And Prefetching Processor). The CAPP provides high performance for loop-based scientific and signal processing programs by improving memory system performance by providing a decoupled prefetch processor. The prefetch processor improves performance by relieving the main processor of prefetching instruction overhead and allowing the prefetch distance to vary adaptively at run-time. In this paper, we present the CAPP architecture, a sample program to show how the architecture works, and simulation results for five Livermore Loops, discrete convolution, and one other benchmark. The simulation results show a speedup of up to two to three for CAPP compared to a uniprocessor with prefetching. The performance advantage of the CAPP architecture increases as the miss penalty gets larger relative to processor cycles, making it an attractive architecture as the difference between processor speed and DRAM speed continues to grow exponentially.
- Date
- November 8, 1996
- Authors
- Stephen P Crago, Alvin M Despain
- Journal
- Submitted to the 24th Annual International Symposium on Computer Architecture