BlueGene/L: System Software Design for Lightweight Operation in a Massively Parallel System
Jose E. Moreira, IBM Thomas J. Watson Research Center
Abstract
BlueGene/L is a new massively parallel supercomputer being developed by IBM in collaboration with Lawrence Livermore National Laboratory. BlueGene/L uses system-on-a-chip integration and a highly scalable cellular architecture to deliver 360 Tflop/s of peak computing power. With 65,536 compute nodes, BlueGene/L represent a new level of scalability for parallel systems and creates specific challenges in the design of the operating environment. In this talk, we describe how we have addressed these challenges by pursuing a system software architecture that provides a lightweight implementation of familiar environments. Central to our architecture is the concept of off-loading functions to specialized hardware, leaving the core computational engines dedicated to running application code. This approach also supports our primary usability goal of providing a user experience that is similar to what application developers are used to in other parallel systems. BlueGene/L is not usually thought of as a processor-in-memory design. Yet, many of its architectural concepts are relevant to PIM and its software architecture can be of interest to that class of systems.