JUNE 18–22, 2017

Presentation Details

Name: (RP02) Implementation & Evaluation of 2.5D Matrix Multiplication on K Computer
Time: Tuesday, June 20, 2017
08:35 am - 09:45 am
Room:   Substanz 1+2  
Breaks:07:30 am - 10:00 am Welcome Coffee
Presenter:   Daichi Mukunoki, RIKEN AICS
Performance improvement of recent supercomputers relies on increasing the parallelism such as the number of nodes or cores. On such highly parallel environments, the performance of a computation task could be communication-bound when the problem size per process is not large enough, and communication avoiding techniques are required to improve the strong scaling performance. The 2.5D algorithm for parallel matrix multiplication has been proposed as such a technique. In this study, we have implemented a 2.5D parallel matrix multiplication using the SUMMA algorithm and conducted the performance evaluation on a highly parallel supercomputer, the K computer, installed at RIKEN AICS, JAPAN, with up to 16384 nodes. A notable point of this study is that our implementation is designed to perform the 2.5D algorithm on 2D distributed matrices on a 2D process grid, and it outperforms conventional 2D implementations (ScaLAPACK PDGEMM and 2D-SUMMA) even when including the matrix redistribution process between 2D and 2.5D distributions. Also, this study presents a detailed performance analysis of the 2.5D implementation by showing the breakdown of the execution time.

Daichi Mukunoki, RIKEN Advanced Institute for Computational Science
Toshiyuki Imamura, RIKEN Advanced Institute for Computational Science

RP02_Mukunoki.pdf (341 KB)