Loading...
Stories, Papers, WIKIs
| Title | Body |
|---|---|
| Simulation of miscible binary mixtures based on lattice Boltzmann method (ACM) |
Abstract: Miscible fluid mixtures, like pouring honey into water, Coca Cola into strong wine, are common phenomena in our daily life. While two miscible fluids are mixed together, their appearances in terms of colors and shapes will change due to their mixing interaction. The interaction between the mixture components could be regarded as a combination of the diffusing process and demixing process. If the former dominates the interaction, it is miscible; otherwise, it is immiscible. The complex microscopic interplay between the mixture components makes the simulation highly challenging. So far, there have been some dedicated research in computer graphics dealing with immiscible mixtures, but few works have been done focusing on miscible mixtures. In this paper, for the first time, we introduce a two-fluid lattice Boltzmann method (LBM), called TFLBM, applied to miscible binary mixtures. Different from other similar methods, the viscous and diffusing properties of the fluid in our work are considered separately, so that the physical insight is exposed more clearly and rationally. In addition, the operation of LBM is mostly a linear local computation, and graphics processing unit (GPU) has been utilized to achieve real-time simulation.
Note: Requires ACMPortal subscription to view in full. |
| Reliability Modeling of MEMS devices on CUDA based HPC Setup |
Abstract: In this paper we have reviewed the development in CUDA and the implementation of various distribution that exists in the reliability for MEMS based devices on a CUDA setup. The various distributions can be highly optimized so that the system can be simulated highly on CUDA. We have shown the type of distribution may vary from exponential to binomial to others that are being proposed recently. The Reliability modeling codes to calculate reliability function, failure rate function, Mean time to failure (MTTF) and Mean Residual time (MRT) are proposed for MEMS technology for these specific calculations. It is observed that High Performance Computing (HPC) can be used to optimize reliability calculation and help to accelerate research in reliability of MEMS. The three key abstractions of CUDA (i.e. hierarchy of thread groups, shared memories, and barrier synchronization) are exposed as a set of extensions to C language, which provides fine-grained data parallelism and thread parallelism, nested within coarse-grained data parallelism and task parallelism. The key is division of the computations of Reliability analysis into crude sub-problems that can be solved parallely in isolation independently, and then into finer pieces that can be executed in parallel with mutual cooperation among them. Allowing threads to solve each sub-problem cooperatively, this division of problem preserves expressivity of language. Each sub-problem is thus scheduled to be solved on any of the available processor cores allowing transparent scalability. Thus computations of Reliability analysis can be performed by using a compiled CUDA program that can execute on any number of GPU cores. During the programming we need not know the exact configuration and thus only the runtime system needs to know the physical processor count. |
| Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures |
Abstract:
We present different kernels based on Lattice-Boltzmann methods for the solution of the two-dimensional Shallow Water and Navier-Stokes equations on fully structured lattices. The functionality ranges from simple scenarios like open-channel flows with planar beds to simulations with complex scene geometries like solid obstacles and non-planar bed topography with drystates and even interaction of the fluid with floating objects. The kernels are integrated into a hardware-oriented collection of libraries targeting multiple fundamentally different parallel hardware architectures like commodity multicore CPUs, the Cell BE, NVIDIA GPUs and clusters. We provide an algorithmic study which compares the different solvers in terms of performance and numerical accuracy in view of their capabilities and their specific implementation and optimisation on the different architectures. We show that an eightfold speedup over optimised multithreaded CPU code can be obtained with the GPU using basic methods and that even very complex flow phenomena can be simulated with significant speedups without loss of accuracy. |
| A Hybrid Analytical DRAM Performance Model, 5th Workshop on Modeling |
Abstract:
As process technology scales, the number of transistors that can fit in a unit area has increased exponentially. Processor throughput, memory storage, and memory throughput have all been increasing at an exponential pace. As such, DRAM has become an ever-tightening bottleneck for applications with irregular memory access patterns. Computer architects in industry sometimes use ad hoc analytical modeling techniques in lieu of cycle-accurate performance simulation to identify critical design points. Moreover, analytical models can provide clear mathematical relationships for how system performance is affected by individual microarchitectural parameters, something that may be difficult to obtain with a detailed performance simulator. Modern DRAM controllers rely on Out-of-Order scheduling policies to increase row access locality and decrease the overheads of timing constraint delays. This paper proposes a hybrid analytical DRAM performance model that uses memory address traces to predict the DRAM efficiency of a DRAM system when using such a memory scheduling policy. To stress our model, we use a massively multithreaded architecture based upon contemporary GPUs to generate our memory address trace. We test our techniques on a set of real CUDA applications and find our hybrid analytical model predicts the DRAM efficiency to within 15.2% absolute error when arithmetically averaged across all applications.
|
| Porous Rock Simulations and Lattice Boltzmann on GPUs |
Abstract:
Investigating how fluids flow inside the complicated geometries of porous rocks is an important problem in the petroleum industry. The lattice Boltzmann method (LBM) can be used to calculate porous rocks’ permeability. In this paper, we show how to implement this method efficiently on modern GPUs. Both a sequential CPU implementation and a parallelized GPU implementation is developed. Both implementations were tested using three porous data sets with known permeabilities. Our work shows that it is possible to calculate the permeability of porous rocks of simulations sizes up to 3683, which fit into the 4 GB memory of the NVIDIA Quadro FX 5800 card. Our single floating-point precision simulation resulted in respectbale 0.95-1.59 MLUPS whereas our GPU implentation achieved remarkable 180+ MLUPS for several lattices in the 1603 to 3683 range allowing calculations that would take hours on the CPU to be done in minutes on the GPU. Techniques for reducing round-off errors are also discussed and implemented. |
| Spatial Sound for Video Games and Virtual Environments Utilizing Real-Time GPU-Based Convolution |
Abstract:
The generation of spatial audio is computationally very demanding and therefore, accurate spatial audio is typically |
| A comparative study on ASIC, FPGAs, GPUs and general purpose processors in the O(N2) gravitational N-body simulation |
Abstract:
In this paper, we describe the implementation of gravitational force calculation for N-body simulations in the context of Astrophysics. It will describe high performance implementations on general purpose processors, GPUs, and FPGAs, and compare them using a number of criteria including speed performance, power efficiency and cost of development. These results show that, for gravitational force calculation and many-body simulations in general, GPUs are very competitive in terms of performance and performance per dollar figures, whereas FPGAs are competitive in terms of performance per Watt figures. |
| Testing the Feasibility of Running a Computationally Intensive Real-Time Traffic Simulation on a Multicore Programmable Graphics Processor |
Abstract:
In the 1960s, a semiconductor scientist named Gordon Moore theorized that the |
| Energy-Aware High Performance Computing with Graphic Processing Units |
Abstract:
The use of Graphics Processing Units (GPUs) in general purpose computing has been shown to |
| GPU-Based One-Dimensional Convolution for Real-Time Spatial Sound Generation |
Abstract:
Incorporating spatialized (3D) sound cues in dynamic and interactive videogames and immersive virtual environment applications is beneficial for a number of reasons, ultimately leading to an increase in presence and immersion. Despite the benefits of spatial sound cues, they are often overlooked in videogames and virtual environments where typically, emphasis is placed on the visual cues. Fundamental to the generation of spatial sound is the one-dimensional convolution operation which is computationally expensive, not lending itself to such real-time, dynamic applications. Driven by the gaming industry and the great emphasis placed on the visual sense, consumer computer graphics hardware, and the graphics processing unit (GPU) in particular, has greatly advanced in recent years, even outperforming the computational capacity of CPUs. This has allowed for real-time, interactive realistic graphics-based applications on typical consumerlevel PCs. Given the widespread use and availability of computer graphics hardware and the similarities that exist between the fields of spatial audio and image synthesis, here we describe the development of a GPU-based, one-dimensional convolution algorithm whose efficiency is superior to the conventional CPU-based convolution method. The primary purpose of the developed GPU-based convolution method is the computationally efficient generation of realtime spatial audio for dynamic and interactive videogames and virtual environments. |

BayWebSoft