Stories, Papers, WIKIs

Titlesort icon Body
GPU-Piv

Abstract:
Digital Particle Image Velocimetry (PIV) is an optical technique used to measure the velocity of seeded particles in real flow. A CCD camera captures the flow field twice under exposure to a short duration laser flash. Recorded image pairs are cross-correlated to extract velocity information from these records. Time resolved PIV technology can capture images with some hundreds of frames per second.
In this paper, we present a PIV-system that implements vector field reconstruction and visualization on programmable graphics processing units (GPUs) thus providing a high-speed back-end for time resolved PIV technology. We propose an efficient FFT implementation on such hardware, which is used to cross-correlate multiple pairs of interrogation windows. To visualize extracted vector fields we employ functionality to create and to render geometry data on the GPU. In this way, not only can any data transfer between the CPU and the GPU be avoided, but spatial information derived from PIV as well as the time history of points in the flow can be combined instantaneously.

GPU-based simulation of side-looking sonar images (IEEE)

Abstract

This paper describes an implementation of a sonar image simulator optimized for running on a computer's Graphics Processing Unit (GPU). GPUs are hardware-optimized to obtain maximum performance on computer graphics applications. Because these applications generally simulate focal plane images (i.e. optical systems, video, etc), some specific adaptations are required to render range images such as those generated by a sonar sensor. Considerations for the simulation of side-scan and synthetic aperture images are discussed, including a thorough explanation of the imaging geometry. The use of the GPU to implement the sonar simulation process is described in detail, including the different render stages required to construct the final image and the addition of more refined effects such as noise, multi-path and the sensor's point spread function (PSF). Other advanced uses of the GPU simulator core are discussed, such as Fast Fourier Transform (FFT) computation and fast image correlation. Finally, results and performance figures are presented.

Paper available at IEEE.

GPU-Based FFT Computation for Multi-Gigabit WirelessHD Baseband Processing (ACM)

Abstract:
The next generation Graphics Processing Units (GPUs) are being considered for non-graphics applications. Millimeter wave(60 Ghz) wireless networks that are capable of multi-gigabit per second (Gbps) transfer rates require a significant baseband throughput. In this work, we consider the baseband of WirelessHD, a 60 GHz communications system, which can provide a data rate of up to 3.8 Gbps over a short range wireless link. Thus, we explore the feasibility of achieving gigabit baseband throughput using the GPUs. One of the most computationally intensive functions commonly used in baseband communications, the Fast Fourier Transform (FFT) algorithm, is implemented on an NVIDIA GPU using their general-purpose computing platform called the Compute Unified Device Architecture (CUDA). The paper, first, investigates the implementation of an FFT algorithm using the GPU hardware and exploiting the computational capability available. It then outlines the limitations discovered and the methods used to overcome these challenges. Finally a new algorithm to compute FFT is proposed, which reduces interprocessor communication. It is further optimized by improving memory access, enabling the processing rate to exceed 4 Gbps, achieving a processing time of a 512-point FFT in less than 200 ns using a two-GPU solution.

GPU-Based FFT Computation for Multi-Gigabit WirelessHD Baseband Processing

Abstract:


The next generation Graphics Processing Units (GPUs) are being considered for non-graphics applications. Millimeter wave (60 Ghz) wireless networks that are capable of multi-gigabit per second (Gbps) transfer rates require a significant baseband throughput. In this work, we consider the baseband of WirelessHD, a 60 GHz communications system, which can provide a data rate of up to 3.8 Gbps over a short range wireless link. Thus, we explore the feasibility of achieving gigabit baseband throughput using the GPUs. One of the most computationally intensive functions commonly used in baseband communications, the Fast Fourier Transform (FFT) algorithm, is implemented on an NVIDIA GPU using their general-purpose computing platform called the Compute Unified Device Architecture (CUDA). The paper, first, investigates the implementation of an FFT algorithm using the GPU hardware and exploiting the computational capability available. It then outlines the limitations discovered and the methods used to overcome these challenges. Finally a new algorithm to compute FFT is proposed, which reduces interprocessor communication. It is further optimized by improving memory access, enabling the processing rate to exceed 4 Gbps, achieving a processing time of a 512-point FFT in less than 200 ns using a two-GPU solution. 

GPU-accelerated Multiphysics Simulation

Abstract:  

 

 In recent technology developments General Purpose computation on Graphics Processor Units (GPGPU) has been recognized a viable HPC technique. In this context, GPU-acceleration is rooted in high-order Single Instruction Multiple Data (SIMD)/Single Instruction Multiple Thread (SIMT) vector-processing capability, combined with high-speed asynchronous I/O and sophisticated parallel cache memory architecture. In this presentation we examine the enParallel, Inc. (ePX) approach in leveraging this technology for accelerated multiphysics computation.

   As is well understood, both complexity and size impact realizable multiphysics simulation performance. Multiphysics applications by definition incorporate diverse model components, each of which employs characteristic algorithmic kernels, (e.g. sparse/dense linear solvers, gradient optimizers, multidimensional FFT/IFFT, wavelet, random variate generators). This complexity is further increased by any requirement for structured communications across module boundaries, (e.g. dynamic boundary conditions, multi-grid (re)discretization, and management of disparate time-scales). Further, multiphysics applications tend toward large scale and long runtimes due to; (a) presence of multiple physical processes and (b) high-order discretization as result of persistent nonlinearity, chaotic dynamics, etc. It then follows acceleration is highly motivated, and any associated performance optimization schema must be sufficiently sophisticated so as to address all salient aspects of process resource mapping and scheduling, and datapath movement. For the GPU-accelerated cluster, this remains a particularly important consideration due to the fact GPU lends an additional degree of freedom to any choice of processing resource; multiphysics performance optimization then reduces to a goal of achieving highest possible effective parallelism across all available HPC resources, each of which is associated with a characteristic process model. 

GPU-Accelerated Algorithms for Gravitational Wave Detection

Abstract:

Gravitational waves are regarded as the ripple through space and time. It has not been directly detected yet and it is expected the the detection of gravitational wave will allow us to "listen" to our universe for violent events such as supernova, or events from early universe such as the big bang. This would be a great way to detect the formation of black holes, especially pairs of spiralling black holes. Scientists are aiming to prove directly the existence of gravitational wave by using dedicated gravitational wave detectors. There are currently six working gravitational wave detectors in the world and each of them is recording time domain data from all sky direction with time resolution of 100 us. This easily adds up to terabytes of data while it is still crucial for the detection and localisation algorithm to achieve real-time performance to enable follow-up observation with conventional telescope on the source direction to con rm that the waves are emitted by theoretical gravitational wave sources. This observation with conventional telescope on the sources will provide a rm proof for the existence of gravitational waves. The inspiral search is a very time consuming process. It currently takes up to 50 central processing units (CPUs) to perform real-time analysis. It is expected that up to 800 CPUs will be needed if X2 tests are to be performed. X2 tests are used to determine the probability of detecting fake signals. The possibility of using graphics processing units (GPUs) for gravitational wave detection is investigated. I used the existing search pipelines for inspiral binaries gravitational wave sources from LIGO (Laser Interferometer Gravitational-wave Observatory) which were run in conventional CPU, replacing its Fast Fourier Transform (FFT) computation with the CUDA FFT library, and I have also applied data-parallelism for the most time consuming modules in the pipelines. The timing performance was compared with the original pipelines. A preliminary result showed a 16-fold speedup. This implies that current implementation of the algorithms in GPU can achieve the computation power of 16 CPUs. The hardware cost and power consumption can be reduced signi cantly if computer clusters are built using GPU for gravitational wave data analysis.

GPU implemention of fast Gabor filters (IEEE)

Abstract

With their parallel multi-core architecture, Programmable Graphics Processing Units (GPUs) are well suited for implementing biologically-inspired visual processing algorithms, such as Gabor filtering. We compare several GPU implementations of Gabor filtering. On the same graphics card (an NVIDIA GeForce 9800 GTX+) and for convolution kernel radii from 8 to 48 pixels, an algorithm that decomposes Gabor filtering into a number of simpler steps results in an algorithm that is 2.2 to 33 times faster than direct 2D convolution and 2.8 to 6.6 times faster than a FFT based approach. Surprisingly, in comparison with an optimized algorithm for Gabor filtering running on a PC (Core2 Duo 3.16GHz), it is only 4-10 times faster. The PC can efficiently implement a recursive 1D filter, which requires far fewer arithmetic operations than convolution. However, due to data dependencies, this recursive filter typically runs slower than 1D convolution on the GPU. This highlights the importance of simultaneously considering both arithmetic and memory operations in porting algorithms to GPUs.

Paper available at IEEE.

GPU friendly Fast Poisson Solver for structured power grid network analysis (IEEE)

Abstract

In this paper, we propose a novel simulation algorithm for large scale structured power grid networks. The new method formulates the traditional linear system as a special two-dimension Poisson equation and solves it using an analytical expressions based on FFT technique. The computation complexity of the new algorithm is O(NlgN), which is much smaller than the traditional solver's complexity O(N1.5) for sparse matrices, such as the SuperLU solver and the PCG solver. Also, due to the special formulation, graphic process unit (GPU) can be explored to further speed up the algorithm. Experimental results show that the new algorithm is stable and can achieve 100X speed up on GPU over the widely used SuperLU solver with very little memory footprint.

Paper available at IEEE.

GPU Friendly Fast Poisson Solver for Structured Power Grid Network Analysis (ACM)

Abstract:
In this paper, we propose a novel simulation algorithm for large scale structured power grid networks. The new method formulates the traditional linear system as a special two-dimension Poisson equation and solves it using an analytical expressions based on FFT technique. The computation complexity of the new algorithm is O(NlgN), which is much smaller than the traditional solver’s complexity O(N1.5) for sparse matrices, such as the SuperLU solver and the PCG solver. Also, due to the special formulation, graphic process unit (GPU) can be explored to further speed up the algorithm. Experimental results show that the new algorithm is stable and can achieve 100X speed up on GPU over the widely used SuperLU solver with very little memory footprint.

GPU friendly Fast Poisson Solver for Structured Power Grid Network Analysis

Abstract:

 

In this paper, we propose a novel simulation algorithm for large scale structured power grid networks. The new method formulates
the traditional linear system as a special two-dimension Poisson equation and solves it using an analytical expressions based on
FFT technique. The computation complexity of the new algorithm is O(NlgN), which is much smaller than the traditional solver’s
complexity O(N^1.5) for sparse matrices, such as the SuperLU solver and the PCG solver. Also, due to the special formulation,
graphic process unit (GPU) can be explored to further speed up the algorithm. Experimental results show that the new algorithm is
stable and can achieve 100X speed up on GPU over the widely used SuperLU solver with very little memory footprint.