The HPC-Colony project is a joint research effort with Oak Ridge National Laboratory, the IBM T.J. Watson Research Center and Haifa Research Center, and the University of Illinois at Urbana-Champaign to create scalable Services and Interfaces that permit BOTH scalable high performance AND easy application porting for high-performance computing (HPC) systems with very large numbers of processors. Funding for the HPC-Colony Project is provided by a grant from the U.S. Department of Energy Office of Science.
The motivation for the HPC-Colony Project is to make portable performance a reality. Today, domain scientists must considerable effort to increase their application's efficiency on a particular machine architecture. Colony is developing system software that dramatically reduces the burden placed upon domain scientists by shifting much of the tuning to adaptive system software. Moreover, the application tuning undertaken to run efficiently on one leadership class machine can migrate to new machines.
Our approach relies on addressing three critical HPC areas:
Ever increasing numbers of processors and the inherent restrictions found in today's system software impose artificial barriers upon the capacity of our most capable HPC machines. For developers to be able to scale applications to these new processor counts, work is needed to make system software free of imbalances and scaling shortcomings. Moreover, the arduous task of balancing an application is best accomplished using dynamically enforced schemes with global knowledge -- a new opportunity for system software. Indeed, system software improvements are needed to provide important benefits to users of HPC systems:
The Colony project is developing a coordinated framework using Linux and the Charm++ run-time system to bring about these HPC goals for the benefit of parallel applications.
Funding for the HPC-Colony Project is provided by a