Stories, Papers, WIKIs

Title Body
Simulation of miscible binary mixtures based on lattice Boltzmann method (ACM)

Abstract:

Miscible fluid mixtures, like pouring honey into water, Coca Cola into strong wine, are common phenomena in our daily life. While two miscible fluids are mixed together, their appearances in terms of colors and shapes will change due to their mixing interaction. The interaction between the mixture components could be regarded as a combination of the diffusing process and demixing process. If the former dominates the interaction, it is miscible; otherwise, it is immiscible. The complex microscopic interplay between the mixture components makes the simulation highly challenging. So far, there have been some dedicated research in computer graphics dealing with immiscible mixtures, but few works have been done focusing on miscible mixtures. In this paper, for the first time, we introduce a two-fluid lattice Boltzmann method (LBM), called TFLBM, applied to miscible binary mixtures. Different from other similar methods, the viscous and diffusing properties of the fluid in our work are considered separately, so that the physical insight is exposed more clearly and rationally. In addition, the operation of LBM is mostly a linear local computation, and graphics processing unit (GPU) has been utilized to achieve real-time simulation.  


Note: Requires ACMPortal subscription to view in full.

Reliability Modeling of MEMS devices on CUDA based HPC Setup

Abstract:

In this paper we have reviewed the development in CUDA and the implementation of various distribution that exists in the reliability for MEMS based devices on a CUDA setup. The various distributions can be highly optimized so that the system can be simulated highly on CUDA. We have shown the type of distribution may vary from exponential to binomial to others that are being proposed recently. The Reliability modeling codes to calculate reliability function, failure rate function, Mean time to failure (MTTF) and Mean Residual time (MRT) are proposed for MEMS technology for these specific calculations. It is observed that High Performance Computing (HPC) can be used to optimize reliability calculation and help to accelerate research in reliability of MEMS. The three key abstractions of CUDA (i.e. hierarchy of thread groups, shared memories, and barrier synchronization) are exposed as a set of extensions to C language, which provides fine-grained data parallelism and thread parallelism, nested within coarse-grained data parallelism and task parallelism. The key is division of the computations of Reliability analysis into crude sub-problems that can be solved parallely in isolation independently, and then into finer pieces that can be executed in parallel with mutual cooperation among them. Allowing threads to solve each sub-problem cooperatively, this division of problem preserves expressivity of language. Each sub-problem is thus scheduled to be solved on any of the available processor cores allowing transparent scalability. Thus computations of Reliability analysis can be performed by using a compiled CUDA program that can execute on any number of GPU cores. During the programming we need not know the exact configuration and thus only the runtime system needs to know the physical processor count. 

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Abstract:  

 

We present different kernels based on Lattice-Boltzmann methods for the solution of the two-dimensional Shallow Water and Navier-Stokes equations on fully structured lattices. The functionality ranges from simple scenarios like open-channel flows with planar beds to simulations with complex scene geometries like solid obstacles and non-planar bed topography with drystates and even interaction of the fluid with floating objects. The kernels are integrated into a hardware-oriented collection of libraries targeting multiple fundamentally different parallel hardware architectures like commodity multicore CPUs, the Cell BE, NVIDIA GPUs and clusters. We provide an algorithmic study which compares the different solvers in terms of performance and numerical accuracy in view of their capabilities and their specific implementation and optimisation on the different architectures. We show that an eightfold speedup over optimised multithreaded CPU code can be obtained with the GPU using basic methods and that even very complex flow phenomena can be simulated with significant speedups without loss of accuracy. 

A Hybrid Analytical DRAM Performance Model, 5th Workshop on Modeling
Abstract:
 
 
As process technology scales, the number of transistors that can fit in a unit area has increased exponentially. Processor throughput, memory storage, and memory throughput have all been increasing at an exponential pace. As such, DRAM has become an ever-tightening bottleneck for applications with irregular memory access patterns. Computer architects in industry sometimes use ad hoc analytical modeling techniques in lieu of cycle-accurate performance simulation to identify critical design points. Moreover, analytical models can provide clear mathematical relationships for how system performance is affected by individual microarchitectural parameters, something that may be difficult to obtain with a detailed performance simulator. Modern DRAM controllers rely on Out-of-Order scheduling policies to increase row access locality and decrease the overheads of timing constraint delays. This paper proposes a hybrid analytical DRAM performance model that uses memory address traces to predict the DRAM efficiency of a DRAM system when using such a memory scheduling policy. To stress our model, we use a massively multithreaded architecture based upon contemporary GPUs to generate our memory address trace. We test our techniques on a set of real CUDA applications and find our hybrid analytical model predicts the DRAM efficiency to within 15.2% absolute error when arithmetically averaged across all applications.
Porous Rock Simulations and Lattice Boltzmann on GPUs

Abstract:

 

Investigating how fluids flow inside the complicated geometries of porous rocks is an important problem in the petroleum industry. The lattice Boltzmann method (LBM) can be used to calculate porous rocks’ permeability. In this paper, we show how to implement this method efficiently on modern GPUs. Both a sequential CPU implementation and a parallelized GPU implementation is developed. Both implementations were tested using three porous data sets with known permeabilities. Our work shows that it is possible to calculate the permeability of porous rocks of simulations sizes up to 3683, which fit into the 4 GB memory of the NVIDIA Quadro FX 5800 card. Our single floating-point precision simulation resulted in respectbale 0.95-1.59 MLUPS whereas our GPU implentation achieved remarkable 180+ MLUPS for several lattices in the 1603 to 3683 range allowing calculations that would take hours on the CPU to be done in minutes on the GPU. Techniques for reducing round-off errors are also discussed and implemented.

Spatial Sound for Video Games and Virtual Environments Utilizing Real-Time GPU-Based Convolution

Abstract:

 

The generation of spatial audio is computationally very demanding and therefore, accurate spatial audio is typically
overlooked in games and virtual environments applications thus leading to a decrease in both performance and the user's
sense of presence or immersion. Driven by the gaming industry and the great emphasis placed on the visual sense,
consumer computer graphics hardware (and the graphics processing unit in particular), has greatly advanced in recent
years, even outperforming the computational capacity of CPUs. This has allowed for real-time, interactive realistic
graphics-based applications on typical consumer-level PCs. Despite the many similarities between the fields of spatial
audio and computer graphics, computer graphics and image synthesis in particular, has advanced far beyond spatial
audio given the emphasis placed on the generation of believable visual cues over other perceptual cues including
auditory. Given the widespread use and availability of computer graphics hardware as well as the similarities that exist
between the fields of spatial audio and image synthesis,this work investigates the application of graphics processing
units for the computationally efficient generation of spatial audio for dynamic and interactive games and virtual environments.
Here we present a real-time GPU-based convolution method and illustrate its superior efficiency to conventional,
software-based, time-domain convolution.

A comparative study on ASIC, FPGAs, GPUs and general purpose processors in the O(N2) gravitational N-body simulation

Abstract:

 

In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of Astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in terms of performance per Watt figures. 

Testing the Feasibility of Running a Computationally Intensive Real-Time Traffic Simulation on a Multicore Programmable Graphics Processor

Abstract:

 

In the 1960s, a semiconductor scientist named Gordon Moore theorized that the
number of transistors would double each year on a single integrated circuit. Through
much effort, the semiconductor industry has been able to closely follow “Moore’s Law”,
but new information shows this type of progress is not sustainable in the coming years.
This realization has implications in both chip fabrication and software development.
Instead of making chips with more transistors per unit area, industry now produces newer
multicore chips. These multicore chips, which have more than one traditional
computational unit, have long been used for computer graphics, but now researchers are
putting their improved throughput to use in computational simulation. New research
documents efforts to speed up traditional simulations on multicore graphics processing
units (GPUs), both to help simulation efforts and to learn about multicore chip potential.
In addition to this new research style, given the acronym GPGPU (Graphics Processor as
a General Processing Unit), there is a social desire to produce improved automotive
traffic safety systems. It follows that research would follow in order to make a faster
traffic simulation using state-of-the-art GPUs. In fact, this document details a research
project aimed at testing the feasibility of running a traffic simulation using the GPGPU
paradigm. 

Energy-Aware High Performance Computing with Graphic Processing Units

Abstract:

 

The use of Graphics Processing Units (GPUs) in general purpose computing has been shown to
incur significant performance benefits, for applications ranging from scientific computing to database sorting and
search. The emergence of high-level APIs facilitates GPU programming to the point that general purpose computing
with GPUs is now considered a viable system design and programming option. Nevertheless, the inclusion of a GPU
in general purpose computing results in an associated increase in the system’s power budget. This paper
presents an experimental investigation into the power and energy cost of GPU operations and a cost/performance
comparison versus a CPU-only system. Through real-time energy measurements obtained using a novel platform
called LEAP-Server, we show that using a GPU results in energy savings if the performance gain is above a certain
bound. We show this bound for an example experiment tested by LEAP-Server. 

GPU-Based One-Dimensional Convolution for Real-Time Spatial Sound Generation

Abstract:

 

Incorporating spatialized (3D) sound cues in dynamic and interactive videogames and immersive virtual environment applications is beneficial for a number of reasons, ultimately leading to an increase in presence and immersion. Despite the benefits of spatial sound cues, they are often overlooked in videogames and virtual environments where typically, emphasis is placed on the visual cues. Fundamental to the generation of spatial sound is the one-dimensional convolution operation which is computationally expensive, not lending itself to such real-time, dynamic applications. Driven by the gaming industry and the great emphasis placed on the visual sense, consumer computer graphics hardware, and the graphics processing unit (GPU) in particular, has greatly advanced in recent years, even outperforming the computational capacity of CPUs. This has allowed for real-time, interactive realistic graphics-based applications on typical consumerlevel PCs. Given the widespread use and availability of computer graphics hardware and the similarities that exist between the fields of spatial audio and image synthesis, here we describe the development of a GPU-based, one-dimensional convolution algorithm whose efficiency is superior to the conventional CPU-based convolution method. The primary purpose of the developed GPU-based convolution method is the computationally efficient generation of realtime spatial audio for dynamic and interactive videogames and virtual environments.