Stories, Papers, WIKIs

Title Body
Energy-Precision Tradeoffs in Mobile Graphics Processing Units

Abstract:


In mobile devices, limiting the Graphics Processing Unit’s (GPU’s) energy usage is of great importance to extending battery life. This paper focuses on the first stage of the graphics processor pipeline – the vertex transformation stage – and introduces an approach to lowering its switching activity by reducing the precision of arithmetic operations. As a result, the approach enables a tradeoff between energy efficiency and the quality of the rendered image. This paper makes the following specific contributions: 1) a transition-based energy model for quantifying energy consumed as a function of arithmetic precision, and 2) detailed simulation results on several real-world graphics applications to evaluate the tradeoff between energy and precision. In most examples, over 23% of the energy can be saved by lowering arithmetic precision while still maintaining a faithful reproduction of the full-precision image. Pushing the idea further, over 36% energy can be saved by further lowering the precision while preserving acceptable result accuracy. We assert that this represents a significant energy savings that warrants further investigation and extension of our approach to the remaining stages of the graphics processor pipeline. 
Accelerating PCG power/ground network solver on GPGPU (IEEE)

Abstract

Currently fast and precise P/G (power/ground) solvers are critical for robust P/G designs, but traditional serial P/G solvers are somewhat incapable of millions of nodes in P/G. In spite of powerful computation capability of parallel hardware, paralleled P/G solvers are far from prevailing, especially on complicated special hardware. We anticipated it, and studied on parallelizing and accelerating P/G solvers on GPU. In our work, we developed a PCG(Preconditioned Conjugate Gradient)-based P/G solver on the CUDA platform for structured P/G network, and identified advantages as well as constraints from GPU architecture.

 Our PCG-GPU solver can be up to 40 times faster than SuperLU, and also outperform multi-grid based solver on GPU.

Paper available at IEEE.

Analysis of power distribution network in TSV-based 3D-IC (IEEE)

Abstract

To reduce simultaneous switching noise (SSN) in a PDN design of TSV-based GPU system, the impedance properties of the hierarchical PDN in the TSV-based GPU system were estimated and analyzed. The system consisted of triple-stacked TSV-based DRAMs on top of the GPU connected by TSVs, a silicon interposer, and a backside re-distribution layer (BS-RDL).

 A segmentation-based impedance-estimation method was used for the estimation of the total PDN impedance combining models of the on-chip PDN, the power/ground (P/G) TSV, and the coplanar P/G line in the BS-RDL. The impedance properties of the PDN were also analyzed with respect to variations in the number of P/G TSVs and P/G lines in the BS-RDL and variation of the capacitance of the on-chip decoupling capacitor embedded in the on-chip PDN.

Paper available at IEEE.

An energy model for graphics processing units (IEEE)

Abstract

We present an energy model for a graphics processing unit (GPU) that is based on the amount and type of work performed in various parts of the unit. By designing and running directed tests on a GPU, we measure the energy consumed when performing different arithmetic and memory operations, allowing us to accurately predict the energy that any arbitrary mix of operations will take.

 With some knowledge of how data travels through and is transformed by the graphics pipeline, we can predict how many of each operation will occur for a given scene, leading to an estimate of the energy usage. We validate our model against different types of existing graphical applications. With an average difference of 3% from measured energy under typical workloads, our model can be used for various purposes. In this work, we explore and present two use cases: 1) predicting the energy performance of applications on a different architecture, and 2) exploring the energy efficiency of different algorithms to achieve the same graphical effect.

Paper available at IEEE.

The effect of varying heat sink fin distances from cooling fan blade tip on noise emissions (IEEE)

Abstract

The challenge to deliver performance improvements in computer graphic cards has surpassed the ability of finned, passive, cooling devices to dissipate the heat generated by next generation graphics processing units (GPU). The dissipation rates required by these latest GPU designs can only be delivered by more complicated thermal management systems which often require forced air cooling of finned heat sinks.

 The concurrent challenge to the industry is to provide this cooling while minimizing the noise generated by these cooling fans. One of the fundamental mechanisms for the generation of fan noise is the dynamic force fluctuations on the fan blade and how these fluctuations interact with fixed irregularities such as adjacent cooling fins. This study investigates the effect on the acoustic emissions resulting from the variation of the distance between the fan blade tips and the heat sink fins. A discussion and comparison of the measured results was presented using both traditional analysis techniques as well as psychoacoustic or sound quality metrics. It was found that a minimum distance between the blade and adjacent obstructions is desired in order to minimize excessive noise levels. The minimization of the noise emissions also had a desirable effect on the sound quality analysis

Paper available at IEEE.

In Situ Power Analysis of General Purpose Graphical Processing Units (IEEE)

Abstract

In this paper, an in situ power analysis profiling over time for general purpose graphics processing units (GPGPU) is presented. Based on this method the power consumption of different modes of operations like data transfer between GPU and host CPU, basic single precision floating point arithmetic operations (addition, subtraction, multiplication) on the multiprocessor units and instructions for shared and global memory access can be measured. There is a factor of 2 difference in power dissipation between various instructions and mode of operations of the GPGPUs. These measurements provide data for an instruction based power estimation of GPU software. It turns out that the power profile over time also gives a good understanding on which section of the program is executed at a certain point in time. The experimental results have been derived from two GPU architectures, namely the GT200 and the GF100 architecture.

Paper available at IEEE.

GTC 2010: Power Management Techniques for Heterogeneous Exascale Computing - Xaiohui Cui

Power consumption has become the leading design constraint for large scale computing systems. In order to achieve exascale computing, system energy efficiency must be improved significantly. Our approach will focus on investigating software methodologies to achieve energy efficient computing on heterogeneous systems accelerated with GPUs.

Power and Performance Characterization of Computational Kernels on the GPU (ACM)

Nowadays Graphic Processing Units (GPU) are gaining increasing popularity in high performance computing (HPC). While modern GPUs can offer much more computational power than CPUs, they also consume much more power. Energy efficiency is one of the most important factors that will affect a broader adoption of GPUs in HPC. In this paper, we systematically characterize the power and energy efficiency of GPU computing. Specifically, using three different applications with various degrees of compute and memory intensiveness, we investigate the correlation between power consumption and different computational patterns under various voltage and frequency levels. Our study revealed that energy saving mechanisms on GPUs behave considerably different than CPUs. The characterization results also suggest possible ways to improve the 'greenness' of GPU computing.

Paper available at ACM.

A new physics engine with automatic process distribution between CPU-GPU (ACM)

The Graphics Processing Units or simply GPUs have evolved into extremely powerful and flexible processors. This flexibility and power have allowed new concepts in general purpose computation to emerge. This paper presents a new architecture for physics engines focusing on the simulation of rigid bodies with some of its methods implemented on the GPU. Sending physics computation to the GPU enables the unloading of the required computations from the CPU, allowing it to process other tasks and optimizations. Another important reason for using the GPU is to allow physics engines to process a higher number of bodies in the simulation. It also presents an automatic process distribution scheme between CPU and GPU. The importance of the automatic distribution for physics simulation arises from the fact that, sometimes, the simulated scene characteristics may change during the simulation and by using an automatic distribution scheme the system may obtain the best performance of both processors (CPU and GPU). Also, with an automatic distribution mode, the developer does not have to decide which processor will do the work allowing the system to choose between CPU and GPU. This paper also presents an uncoupled multithread game loop used by the phys

Paper available at ACM.

GTC 2010: Towards Peta-Scale Green Computation - Applications of the GPU Supercomputers in the Chinese Academy of Sciences (CAS) - Wei Ge, Xiaowei Wang, Yunquan Zhang, Long Wang

China now holds three spots in the June 2010 Top500 list of GPU-based supercomputers, and two of them, using NVIDIA GPUs, are related to CAS. Efficient use of these systems is more important than peak or Linpack performance. This session will cover some of the large-scale multi-GPU applications in CAS, ranging from molecular dynamics below nano-scale to complex flows on meter-scale and porous media on geological scales, as well as fundamental linear algebra and data/image analysis. The idea of keeping high-efficiency and generality of the computation platform by maintaining a consistency among the target physical system, the computational model and algorithm, and the computer hardware will be explained in detail and demonstrated through a number of super-computing applications in the chemical, oil, mining, metallurgical and biological industries.