Dr. Christian Engelmann is a Senior Scientist and the Intelligent Systems and Facilities Research Group Leader at Oak Ridge National Laboratory (ORNL), the US Department of Energy’s (DOE) largest multiprogram science and technology laboratory with an annual budget of $2.4 billion. He has more than 22 years experience in software research and development for extreme-scale high-performance computing (HPC) systems. Dr. Engelmann’s research solves computer science challenges in HPC software, such as scalability, dependability, and interoperability.
His primary expertise is in HPC resilience, i.e., efficiency and correctness in the presence of faults, errors, and failures. Dr. Engelmann is a leading expert and was a member of the DOE Technical Council on HPC Resilience 2013-15. He received the 2015 DOE Early Career Award for research in resilience design patterns. His secondary expertise is in system software for the instrument-to-edge-to-center computing continuum, enabling science breakthroughs with autonomous experiments, “self-driving” laboratories, smart manufacturing, and AI-driven design, discovery and evaluation. He also has expertise in lightweight simulation of future-generation extreme-scale supercomputers, studying the impact of hardware/software properties on performance and resilience. Dr. Engelmann is also an expert in system software for parallel and distributed systems
Dr. Engelmann earned a Dipl.-Ing. (FH) in Computer Systems Engineering from the University of Applied Sciences Berlin, Germany, and a M.Sc. in Computer Science from the University of Reading, UK, both in 2001 as conjoint degrees, and a Ph.D. in Computer Science from the University of Reading in 2008. He is a Senior Member of the Association for Computing Machinery (ACM) and the Institute of Electrical and Electronics Engineers (IEEE). He is also a Member of the Society for Industrial and Applied Mathematics (SIAM) and the Advanced Computing Systems Association (USENIX).
firstname.lastname@example.org | email@example.com
P.O. Box 2008, Oak Ridge, TN 37831-6164, USA
Tel.:+1 (865) 574-3132
Fax:+1 (865) 576-5491
Scopus ID: 18037364000
2021-…: The Open Federated Architecture for the Laboratory of the Future project connects scientific instruments, robot-controlled laboratories and edge/center computing/data resources to enable autonomous experiments, “self-driving” laboratories, smart manufacturing, and AI-driven design, discovery and evaluation.
Recently In the News
2021-03-30: DOE Advanced Scientific Computing Research. New Approach to Fault Tolerance Means More Efficient High-Performance Computers.
2021-01-04: HPCwire. What’s New in HPC Research: GPU Lifetimes, the Square Kilometre Array, Support Tickets & More.
2018-11-19: HPCwire. What’s New in HPC Research: Thrill for Big Data, Scaling Resilience and More.
Latest Peer-Reviewed Publications
- C. Engelmann, O. Kuchar, S. Boehm, M. J. Brim, T. Naughton, S. Somnath, S. Atchley, J. Lange, B. Mintz, and E. Arenholz. The INTERSECT Open Federated Architecture for the Laboratory of the Future. In Communications in Computer and Information Science (CCIS): Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation. 18th Smoky Mountains Computational Sciences & Engineering Conference (SMC) 2022, August, 2022. DOI 10.1007/978-3-031-23606-8_11.
- E. Agullo, M. Altenbernd, H. Anzt, L. Bautista-Gomez, T. Benacchio, L. Bonaventura, H. Bungartz, S. Chatterjee, F. M. Ciorba, N. DeBardeleben, D. Drzisga, S. Eibl, C. Engelmann, W. N. Gansterer, L. Giraud, D. Göddeke, M. Heisig, F. Jézéquel, N. Kohl, X. S. Li, R. Lion, M. Mehl, P. Mycek, M. Obersteiner, E. S. Quintana-Ortí, F. Rizzi, U. Rüde, M. Schulz, F. Fung, R. Speck, L. Stals, K. Teranishi, S. Thibault, D. Thönnes, A. Wagner, and B. Wohlmuth. Resiliency in Numerical Algorithm Design for Extreme Scale Simulations. International Journal of High Performance Computing Applications (IJHPCA), volume 36, number 2, March, 2022. DOI 10.1177/10943420211055188.
- M. Kumar and C. Engelmann. RDPM: An Extensible Tool for Resilience Design Patterns Modeling. In Lecture Notes in Computer Science: Proceedings of the 27th European Conference on Parallel and Distributed Computing (Euro-Par) 2021 Workshops: 14th Workshop on Resiliency in High Performance Computing (Resilience) in Clusters, Clouds, and Grids, August, 2021. DOI 10.1007/978-3-031-06156-1_23. Accept. rate 66.7% (4/6).
- M. Kumar, S. Gupta, T. Patel, M. Wilder, W. Shi, S. Fu, C. Engelmann, and D. Tiwari. Study of Interconnect Errors, Network Congestion, and Applications Characteristics for Throttle Prediction on a Large Scale HPC System. Journal of Parallel and Distributed Computing (JPDC), volume 153, July, 2021. DOI 10.1016/j.jpdc.2021.03.001.
- S. Hukerikar and C. Engelmann. PLEXUS: A Pattern-Oriented Runtime System Architecture for Resilient Extreme-Scale High-Performance Computing Systems. In Proceedings of the 25th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC) 2020, December, 2020. DOI 10.1109/PRDC50213.2020.00014. Accept. rate 40.9% (18/44).
Symbols: Abstract, Publication, Presentation, BibTeX Citation