JUNE 18–22, 2017

Dr. Nathan DeBardeleben


Dr. Nathan DeBardeleben is senior research scientist at Los Alamos National Laboratory in High Performance Computing Design and the lead of the Ultrascale Systems Research Center.  He is also the laboratory lead for supercomputer reliability, fault-tolerance, resilience, and dependability.  Nathan was a founding member of the DOE’s Resilience Technical Council, and runs an international workshop on resilience called Fault Tolerance for HPC at eXtreme Scale (FTXS).  Nathan’s research focuses on studying supercomputer reliability today and working closely with vendors to improve systems of tomorrow.  He also develops software fault injection tools that allow application designers to test and verify their application’s resilience to silent data corruption faults.

Speaker at: Fault Tolerance for Next Generation High Performance Computing
Wednesday, June 21, 2017, 01:45 pm - 03:15 pm
  Evaluating Parallel Application Resiliency with the Software Fault Injector, PFSEFI
Wednesday, June 21, 2017, 02:05 pm - 02:25 pm