Stories, Papers, WIKIs

Titlesort icon Body
Zero pre-shared secret key establishment in the presence of jammers (ACM)

We consider the problem of key establishment over a wireless radio channel in the presence of a communication jammer, initially introduced in [13]. The communicating nodes are not assumed to pre-share any secret. The established key can later be used by a conventional spread-spectrum communication system. We introduce new communication concepts called intractable forward-decoding and efficient backward-decoding. Decoding under our mechanism requires at most twice the computation cost of the conventional SS decoding and one packet worth of signal storage. We introduce techniques that apply a key schedule to packet spreading and develop a provably optimal key schedule to minimize the bit-despreading cost. We also use efficient FFT-based algorithms for packet detection. We evaluate our techniques and show that they are efficient both in terms of resiliency against jammers and computation. Finally, our technique has additional features such as the inability to detect packet transmission until the last few bits are being transmitted, and transmissions being destination-specific. To the best of our knowledge, this is the first solution that is optimal in terms of communication energy cost with very little storage and computation overhead.

Paper available at ACM.

VHF SAR Image Formation Implemented on a GPU (IEEE)

Abstract:
This paper will describe how off-the-shelf 3D graphics cards can be used for scientific computation like SAR processing. In particular, a highly efficient one-dimensional FFT and a fast direct (global) backprojection implementation will be presented and analyzed.

Using GPUs to Compute Large Out-of-Card FFTs (ACM)

Abstract:
The optimization of Fast Fourier Transfer (FFT) problems that can fit into GPU memory has been studied extensively. Such on-card FFT libraries like CUFFT can generally achieve much better performance than their counterparts on a CPU, as the data transfer between CPU and GPU is usually not counted in their performance. This high performance, however, is limited by the GPU memory size. When the FFT problem size increases, the data transfer between system and GPU memory can comprise a substantial part of the overall execution time. Therefore, optimizations for FFT problems that outgrow the GPU memory can not bypass the tuning of data transfer between CPU and GPU. However, no prior study has attacked this problem. This paper is the first effort of using GPUs to efficiently compute large FFTs in the CPU memory of a single compute node.

In this paper, the performance of the PCI bus during the transfer of a batch of FFT subarrays is studied and a blocked buffer algorithm is proposed to improve the effective bandwidth. More importantly, several FFT decomposition algorithms are proposed so as to increase the data locality, further improve the PCI bus efficiency and balance computation between kernels. By integrating the above two methods, we demonstrate an out-of-card FFT optimization strategy and develop an FFT library that efficiently computes large 1D, 2D and 3D FFTs that can not fit into the GPU‘s memory. On three of the latest GPUs, our large FFT library achieves much better double precision performance than two of the most efficient CPU based libraries, FFTW and Intel MKL. On average, our large FFTs on a single GeForce GTX480 are 46% faster than FFTW and 57% faster than MKL with multiple threads running on a four-core Intel i7 CPU. The speedup on a Tesla C2070 is 1.93x and 2.11x over FFTW and MKL. A peak performance of 21GFLOPS is achieved for a 2D FFT of size 2048x65536 on C2070 with double precision.

Paper available at ACM.

Using Commodity Graphics Hardware for Real-Time Digital Hologram View-Reconstruction (IEEE)

Abstract

View-reconstruction and display is an important part of many applications in digital holography such as computer vision and microscopy. Thus far, this has been an offline procedure for megapixel sized holograms. This paper introduces an implementation of real-time view-reconstruction using programmable graphics hardware. The theory of Fresnel-based view-reconstruction is introduced, after which an implementation using stream programming is presented. Two different fast Fourier transform (FFT)-based reconstruction methods are implemented, as well as two different FFT strategies.

 The efficiency of the methods is evaluated and compared to a CPU-based implementation, providing over 100 times speedup for a hologram size of 2048 times 2048.

Paper available at IEEE.

Using Commodity Graphics Hardware for Real-Time Digital Hologram View-Reconstruction

View-reconstruction and display is an important part of many applications in digital holography such as computer
vision and microscopy. Thus far, this has been an offline procedure for megapixel sized holograms. This paper introduces
an implementation of real-time view-reconstruction using programmable graphics hardware. The theory of Fresnel-based
view-reconstruction is introduced, after which an implementation using stream programming is presented. Two different fast
Fourier transform (FFT)-based reconstruction methods are implemented, as well as two different FFT strategies. The efficiency
of the methods is evaluated and compared to a CPU-based implementation, providing over 100 times speedup for a hologram
size of 2048 × 2048. 

Ultra-fast FFT protein docking on graphics processors (ACM)

Motivation: Modelling protein–protein interactions (PPIs) is an increasingly important aspect of structural bioinformatics. However, predicting PPIs using in silico docking techniques is computationally very expensive. Developing very fast protein docking tools will be useful for studying large-scale PPI networks, and could contribute to the rational design of new drugs.

Results: The Hex spherical polar Fourier protein docking algorithm has been implemented on Nvidia graphics processor units (GPUs). On a GTX 285 GPU, an exhaustive and densely sampled 6D docking search can be calculated in just 15 s using multiple 1D fast Fourier transforms (FFTs). This represents a 45-fold speed-up over the corresponding calculation on a single CPU, being at least two orders of magnitude times faster than a similar CPU calculation using ZDOCK 3.0.1, and estimated to be at least three orders of magnitude faster than the GPU-accelerated version of PIPER on comparable hardware. Hence, for the first time, exhaustive FFT-based protein docking calculations may now be performed in a matter of seconds on a contemporary GPU. Three-dimensional Hex FFT correlations are also accelerated by the GPU, but the speed-up factor of only 2.5 is much less than that obtained with 1D FFTs. Thus, the Hex algorithm appears to be especially well suited to exploit GPUs compared to conventional 3D FFT docking approaches.

Availability: http://hex.loria.fr/ and http://hexserver.loria.fr/

Contact: dave.ritchie@loria.fr

Supplementary information:Supplementary data are available at Bioinformatics online.

Paper available at ACM.

True 4D Image Denoising on the GPU

Abstract

 

The use of image denoising techniques is an important part of many medical imaging applications. One common application is
to improve the image quality of low-dose, i.e. noisy, computed tomography (CT) data. The medical imaging domain has seen a
tremendous development during the last decades. It is now possible to collect time resolved volumes, i.e. 4D data, with a number of
modalities (e.g. ultrasound (US), CT, magnetic resonance imaging (MRI)). While 3D image denoising previously has been applied
to several volumes independently, there has not been much work done on true 4D image denoising, where the algorithm considers
several volumes at the same time (and not a single volume at a time). By using all the dimensions, it is for example possible
to remove some of the time varying reconstruction artefacts that exist in CT volumes. The problem with 4D image denoising,
compared to 2D and 3D denoising, is that the computational complexity increases exponentially.
In this paper we describe a novel algorithm for true 4D image denoising, based on local adaptive filtering, and how to implement
it on the graphics processing unit (GPU). The algorithm was applied to a 4D CT heart dataset of the resolution 512 x 512 x 445 x 20.
The result is that the GPU can complete the denoising in about 25 minutes if spatial filtering is used and in about 8 minutes if FFT
based filtering is used. The CPU implementation requires several days of processing time for spatial filtering and about 50 minutes
for FFT based filtering. Fast spatial filtering makes it possible to apply the denoising algorithm to larger datasets (compared to if
FFT based filtering is used). The short processing time increases the clinical value of true 4D image denoising significantly.

 

Youtube video

http://www.youtube.com/watch?v=wflbt2sV34M

The Parallel Waves Simulation Based on GPU (IEEE)

Abstract:
This paper bases on the observation and research results of oceanography, comes up with a method of ocean simulation that based on GPU. With the development of the hardware, specifically the high development of GPU and its high speed computing power make GPU a new research hotspot. The paper uses FFT algorithm to generate height map, makes use of GPU to improve realism of ocean, and OpenGL is used to draw light and color. By researching above algorithm and technology, we realize a wave simulation system. Experimental results show that this method achieves good results on realism and real-time effects.

Paper available at IEEE.

The FFT on a GPU (ACM)

 Abstract:

 The Fourier transform is a well known and widely used tool in many scientific and engineering fields. The Fourier transform is essential for many image processing techniques, including filtering, manipulation, correction, and compression. As such, the computer graphics community could benefit greatly from such a tool if it were part of the graphics pipeline. As of late, computer graphics hardware has become amazingly cheap, powerful, and flexible. This paper describes how to utilize the current generation of cards to perform the fast Fourier transform (FFT) directly on the cards. We demonstrate a system that can synthesize an image by conventional means, perform the FFT, filter the image, and finally apply the inverse FFT in well under 1 second for a 512 by 512 image. This work paves the way for performing complicated, real-time image processing as part of the rendering pipeline.

 

 

The FFT on a GPU (ACM)

Abstract:
The Fourier transform is a well known and widely used tool in many scientific and engineering fields. The Fourier transform is essential for many image processing techniques, including filtering, manipulation, correction, and compression. As such, the computer graphics community could benefit greatly from such a tool if it were part of the graphics pipeline. As of late, computer graphics hardware has become amazingly cheap, powerful, and flexible. This paper describes how to utilize the current generation of cards to perform the fast Fourier transform (FFT) directly on the cards. We demonstrate a system that can synthesize an image by conventional means, perform the FFT, filter the image, and finally apply the inverse FFT in well under 1 second for a 512 by 512 image. This work paves the way for performing complicated, real-time image processing as part of the rendering pipeline.

Paper available at ACM.