Presentation Details |
|||||
Name: | (RP03) A Portable Distributed Sparse Grid Density Estimation for Big Data Clustering | ||||
Time: | Tuesday, June 20, 2017 08:35 am - 09:45 am |
||||
Room: | Substanz 1+2 | ||||
Breaks: | 07:30 am - 10:00 am Welcome Coffee | ||||
Presenter: | David Pfander, University of Stuttgart | ||||
Abstract: | The clustering of data points is one of the central tasks in data mining. For Big Data scenarios with millions to billions of data points, highly-efficient algorithms are required. We present an accelerator-enabled distributed clustering algorithm. It is based on a spatial discretization using sparse grids. Our clustering algorithm uses density estimation of the dataset to prune a nearest neighbor graph of the dataset. A key benefit of the sparse grid density estimation is that it scales linearly in the size of the dataset and it is therefore well-suited for vast datasets. We have realized efficient implementations in OpenCl that efficiently exploit CPUs and accelerator cards of different vendors. First results show a good scaling behavior on 64 nodes of Piz Daint, a large Nvidia Pascal installation, for synthetic datasets with up to 10 dimensions and 10 million data points. On the node-level, we achieve between 23% and 50% of the peak performance on hardware platforms of different vendors. As we are limited to two thirds of the peak performance due to the instruction mix, we achieve up to 76% of the practically possible peak performance. Our approach displays good scalability, high node-level performance and performance portability. Authors: David Pfander, Universität Stuttgart Gregor Daiß, Universität Stuttgart Dirk Pflüger, Universität Stuttgart |
||||
Download | RP03_Pfander.pdf (14147 KB) |
||||