Functionality Distribution for Parallel Rendering (ACM)
Handling very large datasets has been a key problem addressed in real-time distributed rendering research. With the advent of the programmable Graphics Processing Unit (GPU), itis now possible and even profitable to move many application-specific computations to be carried out by the GPU. It has been shown that modern GPUs outperform the standard PC-platform CPUs on a broad class of computations by over a factor of seven. Given the low costs and high processing speeds of GPUs, there is a trend towards using clusters of CPU/GPU systems. Configuring and programming these clusters for efficient distribution ofdata and computations is a major challenge. What are the computations that can be offloaded from the CPU to a GPU? The answer to this question is not simple as it depends on the following four factors: GPU's processing capacity, GPU's internal bandwidth, GPU-CPU communication bandwidth and the external network bandwidth. All these factors are subjectto change with every generation of hardware. But additions and alternatives to the traditional data-parallel architectures are now needed to exploit the full capability of such clusters using functional parallelism. In this paper, we present a number of architectural configurations that could be adapted on such clusters. Specifically, we demonstrate use of one such architecture: application of a GPU-based pipelined architecture to our work on real-time processing and rendering of large-point datasets which demands complex computations. We have also introduced a list of application and system parameters that are necessary to determine an optimal distribution of computation on the GPUs of a graphics cluster.
Paper available at ACM.