Abstract: |
|
Over the last years, the resilience topic evolved from an open question
to a more pragmatic version where the occurrences are not questioned
anymore, but instead the focus is on the frequency of such radical
events during the
execution of applications at scale. Solutions to transparently manage
faults at the system level exists, with their benefits and drawbacks.
Empowering the developers to deal with the failure events instead, is a
much more revolutionary approach, an approach
with higher opportunities for efficiency, that needs holistic support
from all layers: hardware, software and from the parallel programming
paradigm. This talk will highlight application-driven techniques to
survive faults and their expected costs at scale,
as well as the necessary support from the programming paradigms and
their runtimes. |
|