This work is part of the U.S. Department of Energy’s Institute for Advanced Architecture and Algorithms (IAA). It was established in 2008 to facilitate the co-design of architectures and applications in order to create synergy in their respective evolutions for closing the gap between the peak capabilities of the hardware and the performance realized by high performance computing applications (application-architecture performance gap).
This project focuses on the development of architecture-aware algorithms and the supporting runtime features needed by these algorithms to solve general sparse linear systems common in many scientific applications. Targeted architecture-aware algorithms include (1) multi-precision Krylov solvers, preconditioners, and multi-level smoothers, (2) multi-resolution, multi-precision fast Poisson and Helmholtz solvers, (3) multi-core aware hybrid algorithms for preconditioning, and (4) parallel-in-time algorithms based on Krylov Deferred Correction. Targeted features within an architecture-aware runtime environment include multi-core aware Message Passing Interface (MPI) memory allocation, multi-level MPI communicators, and process-to-core and memory-to-core affinity.
This project further focuses on evaluating the algorithmic impact of future architecture choices and determining what architecture changes would have the highest impact. The evaluation includes (1) detailed performance analyses of key computational kernels on different simulated node architectures, (2) analysis and development of new memory access capabilities that may improve use of memory bandwidth and cache memory resources, and (3) simulation of system architectures at full scale to evaluate the scalability and fault tolerance behavior of key science algorithms.
Prominent Solutions
Funding Sources
- Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy
- National Nuclear Security Administration, U.S. Department of Energy
Participating Institutions
- Oak Ridge National Laboratory
- Sandia National Laboratory
- University of Minnesota
- University of Maryland
Peer-reviewed Conference Publications
- Swen Böhm and Christian Engelmann. xSim: The Extreme-Scale Simulator. In Proceedings of the International Conference on High Performance Computing and Simulation (HPCS) 2011, pages 280-286, Istanbul, Turkey, July 4-8, 2011. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-61284-383-4. DOI 10.1109/HPCSim.2011.5999835. Acceptance rate 28.1% (48/171).
Peer-reviewed Workshop Publications
- Ian S. Jones and Christian Engelmann. Simulation of Large-Scale HPC Architectures. In Proceedings of the 40th International Conference on Parallel Processing (ICPP) 2011: 2nd International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI), pages 447-456, Taipei, Taiwan, September 13-19, 2011. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-0-7695-4511-0. ISSN 1530-2016. DOI 10.1109/ICPPW.2011.44.
- Christian Engelmann and Frank Lauer. Facilitating Co-Design for Extreme-Scale Systems Through Lightweight Simulation. In Proceedings of the 12th IEEE International Conference on Cluster Computing (Cluster) 2010: 1st Workshop on Application/Architecture Co-design for Extreme-scale Computing (AACEC), pages 1-8, Hersonissos, Crete, Greece, September 20-24, 2010. IEEE Computer Society, Los Alamitos, CA, USA. ISBN 978-1-4244-8395-2. DOI 10.1109/CLUSTERWKSP.2010.5613113.
Talks and Lectures
- Christian Engelmann. Resilience and Hardware/Software Co-design for Extreme-Scale Supercomputing. Seminar at the Barcelona Supercomputing Center, Barcelona, Spain, July 27, 2011.
- Christian Engelmann. Beyond Application-Level Checkpoint/Restart – Advanced Software Approaches for Fault Resilience. Talk at the 39th SPEEDUP Workshop on High Performance Computing, Zurich, Switzerland, September 6, 2010.
- Christian Engelmann and Stephen L. Scott. HPC System Software Research at Oak Ridge National Laboratory. Seminar at the Leibniz Rechenzentrum (LRZ), Garching, Germany, February 22, 2010.
- Christian Engelmann. High-Performance Computing Research Internship and Appointment Opportunities at Oak Ridge National Laboratory. Seminar at the Department of Computer Science, University of Reading, Reading, United Kingdom, December 14, 2009.
- Christian Engelmann. JCAS – IAA Simulation Efforts at Oak Ridge National Laboratory. Invited talk at the IAA Workshop on HPC Architectural Simulation (HPCAS), Boulder, CO, USA, September 1-2, 2009.
Co-advised Theses
- Ian S. Jones. Simulation of Large Scale Architectures on High Performance Computers. Master’s thesis, Department of Computer Science, University of Reading, UK, October 22, 2010. Thesis research performed at Oak Ridge National Laboratory. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory); George Bosilca (University of Tennessee, Knoxville).
- Frank Lauer. Simulation of Advanced Large-Scale HPC Architectures. Master’s thesis, Department of Computer Science, University of Reading, UK, March 12, 2010. Thesis research performed at Oak Ridge National Laboratory. Advisors: Prof. Vassil N. Alexandrov (University of Reading); Christian Engelmann (Oak Ridge National Laboratory); George Bosilca (University of Tennessee, Knoxville).
Symbols: Abstract, Publication, Presentation, BibTeX Citation