JUNE 18–22, 2017

Presentation Details

Name: (RP12) On-Line Monitoring Framework
Time: Tuesday, June 20, 2017
08:35 am - 09:45 am
Room:   Substanz 1+2  
Breaks:07:30 am - 10:00 am Welcome Coffee
Presenter:   Eugen Betke, DKRZ
I/O monitoring is important to find performance inefficiencies in applications. Most of the monitoring tools are based on the following methods. Injection of instrumentation code into applications is an intrusive method which requires re-compilation of the application. Collecting I/O data from proc-files provide no information about file access. Dynamic linking of instrumentation libraries with LD_PRELOAD is typically used for creation of trace files for the post-mortem analysis. In our work, we construct an HPC on-line monitoring framework on top of open source software: SIOX, FUSE, Elasticsearch and Grafana. This framework collects I/O statistics from applications and mount points. The latter can be used for a non-intrusive monitoring of virtual memory allocated with mmap(). Instead of gathering I/O statistics from global system variables, like many other monitoring tools do, in our approach statistics come directly from I/O interfaces POSIX, MPI, HDF5 and NetCDF.

Eugen Betke, Deutsches Klimarechenzentrum
Julian Kunkel, Deutsches Klimarechenzentrum

RP12_Betke.pdf (321 KB)