Presentation Details |
|||||
Name: | (RP02) Implementation & Evaluation of 2.5D Matrix Multiplication on K Computer | ||||
Time: | Tuesday, June 20, 2017 08:35 am - 09:45 am |
||||
Room: | Substanz 1+2 | ||||
Breaks: | 07:30 am - 10:00 am Welcome Coffee | ||||
Presenter: | Daichi Mukunoki, RIKEN AICS | ||||
Abstract: | Performance improvement of recent supercomputers relies on increasing the parallelism such as the number of nodes or cores. On such highly parallel environments, the performance of a computation task could be communication-bound when the problem size per process is not large enough, and communication avoiding techniques are required to improve the strong scaling performance. The 2.5D algorithm for parallel matrix multiplication has been proposed as such a technique. In this study, we have implemented a 2.5D parallel matrix multiplication using the SUMMA algorithm and conducted the performance evaluation on a highly parallel supercomputer, the K computer, installed at RIKEN AICS, JAPAN, with up to 16384 nodes. A notable point of this study is that our implementation is designed to perform the 2.5D algorithm on 2D distributed matrices on a 2D process grid, and it outperforms conventional 2D implementations (ScaLAPACK PDGEMM and 2D-SUMMA) even when including the matrix redistribution process between 2D and 2.5D distributions. Also, this study presents a detailed performance analysis of the 2.5D implementation by showing the breakdown of the execution time. Authors: Daichi Mukunoki, RIKEN Advanced Institute for Computational Science Toshiyuki Imamura, RIKEN Advanced Institute for Computational Science |
||||
Download | RP02_Mukunoki.pdf (341 KB) |
||||