Commutativity Analysis

Pedro Diniz (pedro@isi.edu)


Summary

Parallel machines offer the promise of a significant performance increase by enabling multiple processors to concurrently execute different parts of the computation. Programmers have traditionally developed applications for parallel machines using explicitly parallel languages. Unlike serial programming languages, explicitly parallel languages present a complex programming model characterized by such phenomena as deadlock, nondeterministic execution and, on message-passing machines, the need to directly manage the movement of data through the computation. The obvious programming advantages of the sequential imperative programming paradigm have therefore inspired the development of analysis techniques and compilers designed to automatically parallelize serial programs.

Current parallelizing compilers preserve the semantics of the original serial program by preserving the data dependences. These compilers attempt to identify independent pieces of computation (two pieces of computation are independent if neither writes a piece of data that the other accesses), then generate code that executes independent pieces concurrently. A significant limitation of this approach is the difficulty of performing dependence analysis that is precise enough to expose substantial amounts of concurrency. While researchers have been able to develop reasonably effective algorithms for loop nests that manipulate dense matrices using affine access functions, there has been little progress towards the successful automatic analysis of programs that manipulate pointer-based data structures. Researchers have attempted to build totally automatic systems, but the most promising approaches require the programmer to provide annotations that specify information about the global topology of the manipulated data structures. A second, more fundamental limitation of data-dependence based approaches is an inability to parallelize computations that manipulate graph-like data structures. The aliases inherently present in these data structures preclude the static discovery of independent pieces of code, forcing the compiler to generate serial code. Finally, preserving the data dependences for programs that periodically update shared data structures can artificially limit the amount of exposed concurrency, since tasks must delay updates until they are sure that each update will not change the relative order of reads and writes to the shared data structure.

Commutativity analysis is a static analysis framework for detecting commuting operations. Two operations commute when they generate the same result regardless of the order in which they execute. Commutativity analysis eliminates many of the limitations of existing data-dependence based approaches Instead of preserving the relative order of individual reads and writes to single words of memory, commutativity analysis aggregates both data and computation into large grain units. It then statically analyzes the computation at this granularity to discover when pieces of the computation commute i.e. generate the same result regardless of the order in which they execute. If all of the operations required to perform a given computation commute, the compiler can automatically generate parallel code. While the resulting parallel program may violate the data dependences of the original serial program, it is still guaranteed to generate the same result.

This approach has several interesting properties. Because commutativity analysis does not rely on information about the topology of the manipulated data structures, a compiler that uses commutativity analysis does not need to analyze the parts of the code that build the data structure. Commutativity analysis is therefore suitable for incomplete computations, such as applications that manipulate persistent data (e.g. object-oriented database applications) or applications that manipulate geographically dispersed data (e.g. mobile computations in the World Wide Web). In these cases, the code that originally built the data structure may not be available or may no longer exist. Commutativity analysis is also particularly suited for computations that manipulate graphs. This is an important aspect of commutativity analysis because computations that manipulate these data structures are inherently beyond the reach of data dependence analysis.

We took a systems-oriented approach to evaluate the ability of commutativity analysis to extract and exploit significant amounts of parallelism in complete applications. We built a parallelizing compiler that uses commutativity analysis as its primary analysis paradigm. This compiler takes as input an unannotated sequential program written in a subset of C++ and generates parallel code for a shared-memory multiprocessor. Using this parallelizing compiler, we automatically parallelize several applications, including the Barnes-Hut N-body solver, a liquid water simulation code, and a seismic modeling code. For all of these applications, commutativity analysis is able to expose a substantial amount of concurrency and the generated parallel code exhibits very good performance. On 16 processors, the automatically parallelized version of Barnes-Hut runs between 11 and 12 times faster than the original sequential version; for the water simulation application, the automatically parallelized version runs between 7 and 8 times faster than the original sequential version; for the seismic modeling application, the automatically parallelized version runs about 12 times faster than the original sequential version.

These experimental results are very encouraging. These applications are very dynamic in nature - they either manipulate sophisticated pointer-based data structures or exhibit very irregular data access patterns. Exploiting coarse-grain parallelism in applications with these characteristics has been recognized as a very hard problem. We are not aware of any other compiler or compilation technique capable of extracting significant amounts of concurrency for these computations. Furthermore, the positive experimental results provide concrete evidence of the practical significance of commutativity analysis as a technique for automatic parallelizing compilers.



Related Publications

View this page in Romanian courtesy of azoft