Blocked All-Pairs Shortest Paths Algorithm for Hybrid CPU-GPU System (ACM)
This paper presents a blocked algorithm for the all-pairs shortest paths (APSP) problem for a hybrid CPU-GPU system. In the blocked APSP algorithm, the amount of data communication between CPU (host) memory and GPU memory is minimized. When a problem size (the number of vertices in a graph) is large enough compared with a blocking factor, the blocked algorithm virtually requires CPU$ ightleftharpoons$GPU exchanging of two block matrices for a block computation on the GPU. We also estimate a required memory/communication bandwidth to utilize the GPU efficiently. On a system containing an Intel West mere CPU (Core i7 970) and an AMD Cypress GPU (Radeon HD 5870), our implementation of the blocked APSP algorithm achieves the performance up to 1 TFlop/s in single precision.
Paper available at ACM.