Scalable caching techniques for a weakly coherent memory



There is a growing acceptance that general purpose parallel computing requires the use of a scalable shared memory environment. The Cray T3D, IBM SP2 and Intel Paragon message passing machines support a scalable interconnect for up to 100´s or 1000´s of processors, with linear increases in bisection bandwidth as the number of processors grow. Supporting a shared address space on these machines results in a two-level memory hierarchy, in which data are either local or shared across the machine. The next few years will see a trend towards cache coherent multiprocessors, using the techniques employed by machines such as the KSR (cach-only memory) and the DASH (distributed directories). This will simplify the programming model by processoring a single level memory hierarchy.  This paper describes a highly scalable caching technique, which is targeted at a weakly coherent form of shared memory, supported by the WPRAM computational model. (A processor wishing to read newly written shared data must explicitly synchronize in some way with the writer of that data). The example provides supports coherency for barrier synchronisation operation, but can be extended to other forms. A case study using the simplex method for linear programming is given. Results are based on a simulation of a scalable distributed memory machine