Object-oriented Stream Programming using Aspects: A High-Productivity Programming Paradigm for Hybrid Platforms (ACM)
The move to massively parallel hybrid platforms, such as multicore CPUs accelerated with heterogeneous GPU co-processing systems, is significantly impacting software programmers because existing programs have to be properly parallelized before they can take advantage of these advanced processing architectures. However, using current programming frameworks such as CUDA leads to tangled source code that combines code for the core computation with that for device and computational kernel management, data transfers between memory spaces, and various optimizations. In this research, we propose a programming system based on the principles of Aspect-Oriented Programming, to un-clutter the code and to improve programmability of these heterogeneous parallel systems. Specifically, we use a standard Object-Oriented language to describe the core computations and aspects to encapsulate all other support functions, such as parallelization granularity and memory access optimization. An aspect-weaving compiler is then used to combine the core OO program with these aspects to generate parallelized programs. This approach modularizes concerns that are hard to manage using conventional programming frameworks such as CUDA, has a small impact on existing program structure as well as performance, and as a result, simplifies the programming of accelerator-based heterogeneous parallel systems. Studies on example programs suggest that programs written using this system can be successfully translated to CUDA programs for execution on a CPU + GPU co-processing system with comparable performance. The performance of the translated code achieved ∼80% of the hand-coded CUDA programs.
We also introduce a performance model based on Bulk Synchronous Parallel (BSP) to help with quick identification of performance bottlenecks and tuning programs for better performance. This model defines a machine parameter (Machine Characteristic Ratio) and an application parameter (Application Characteristic Ratio) to identify the principle factors that can be used to bound application performance for the hierarchical parallel execution in the GPU co-processing device.
Paper available at ACM.