Performance Portability of a GPU Enabled Factorization with the DAGuE Framework (IEEE)
Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with non-uniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present the GPU subsystem of the DAGuE runtime, and assess, on the Cholesky factorization test case, the minimal efforts required by a programmer to enable GPU acceleration in the DAGuE framework. The performance achieved by this unchanged code, on a variety of heterogeneous and distributed many cores and GPU resources, demonstrates the desired performance portability.
Paper available at IEEE.