DFG Implementation on Multi GPU Cluster with Computation-Communication Overlap (IEEE)

Publication Year: 
2012

Abstract:
Nowadays, it is possible to build a multi-GPU supercomputer, well suited for implementation of digital signal processing algorithms, for a few thousand dollars. However, to achieve the highest performance with this kind of architecture, the programmer has to focus on inter-processor communications, tasks synchronization … In this paper, we propose a design flow allowing an efficient implementation of a Digital Signal Processing (DSP) application specified as a Data Flow Graph (DFG) on a multi GPU computer cluster. We focus particularly on the effective implementation of communications by automating the computation-communication overlap, which can lead to significant speedups as shown in the presented benchmark. The approach is validated on a 3D granulometry application developed for research on materials.

Paper available at IEEE.

Institution: 
GIPSA-lab, UMR5216 CNRS/INPG/UJF/U.Stendhal, F-38402 GRENOBLE CEDEX, France