A compiler for parallel execution of numerical Python programs on graphics processing units
Modern Graphics Processing Units (GPUs) are providing breakthrough performance for numerical computing at the cost of increased programming complexity. Current programming models for GPUs require that the programmer manually manage the data transfer between CPU and GPU. This thesis proposes a simpler programming model and introduces a new compilation framework to enable Python applications containing numerical computations to be executed on GPUs and multi-core CPUs.
The new programming model minimally extends Python to include type and parallel-loop annotations. Our compiler framework then automatically identies the data to be transferred between the main memory and the GPU for a particular class of affine array accesses. The compiler also automatically performs loop transformations to improve performance on GPUs.
For kernels with regular loop structure and simple memory access patterns, the GPU code generated by the compiler achieves signicant performance improvement over multi-core CPU codes.