The Department of Mathematical Sciences and the College of Computing will present a lecture on high-performance computing by Dr. Laura Monroe from the Ultrascale Systems Research Center (USRC) at Los Alamos National Laboratory on Tuesday, September 24, from 5:00 to 6:00 p.m., in Fisher Hall, Room 133. The lecture is titled “The Mathematical Analysis of Faults and the Resilience of Applications.” Discussion will follow the lecture, and pizza and refreshments will be served.
Abstract: As the post-Moore’s-Law era advances, faults are expected to increase in number and in complexity on emerging novel devices. This will happen on exascale and post-exascale architectures due to smaller feature sizes, and also on new devices with unusual fault models. Attention to error-correction and resilience will thus be needed in order to use such devices effectively. Known mathematical error-correction methods may not suffice under these conditions, and an ad hoc approach will not cover the cases likely to emerge, so mathematical approaches will be essential. We will discuss the mathematical underpinnings behind such approaches, illustrate with examples, and emphasize the interdisciplinary approaches that combine experimentation, simulation, mathematical theory and applications that will be needed for success.
Dr. Monroe has spent most of her career focused on unconventional approaches to difficult computing problems, specifically researching new technologies to enable better performance as processor-manufacturing techniques reach the limits of the atomic scale, also known as the end of Moore’s Law. Dr. Monroe received her PhD in the theory of error-correcting codes, working with Dr.Vera Pless. She worked at NASA Glenn, then joined Los Alamos National Laboratory in 2000. She has contributed on the design teams on the LANL Cielo and Trinity supercomputers, and originated and leads the Laboratory’s inexact computing project that is meant to address Moore’s Law challenges in a unique way. She also provides mathematical and theoretical support to LANL’s HPC Resilience project.
https://doi.org/10.1016/j.coldregions.2019.102856