Evaluating Parallel Application Resiliency with the Software Fault Injector, PFSEFI
Time:
Wednesday, June 21, 2017 02:05 pm - 02:25 pm
Room:
Panorama 1 Messe Frankfurt
Speaker:
Nathan DeBardeleben, Los Alamos National Laboratory
Abstract:
Application resiliency to faults is a concern as supercomputers grow to
ever larger sizes while the semiconductor industry shrinks components
and carefully reduces voltage to minimize power use. Users of today's
supercomputers
need to plan for tomorrow's systems and the parallel software fault
injection tool, PFSEFI, can help by evaluating application resiliency
and vulnerability. In this talk we will discuss PFSEFI and see how
insights gained from using the tool can be used to
quantify application vulnerability to silent data corruption and look
at techniques to improve application resilience.