Stories, Papers, WIKIs

Title Body
Introducing GMAC

We are proud to announce the first public version of GMAC.

GMAC is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model builds a global memory space that allows CPU code to transparently access data hosted in accelerators' (GPUs) memories. Moreover, the coherency of the data is automatically handled by the library. This removes the necessity for manual memory transfers (cudaMemcpy) between the host and GPU memories.

GMAC is being developed by the Operating System Group at the Universitat Politecnica de Catalunya and the IMPACT Research Group at the Univeristy of Illinois under the University of Illinois/NCSA Open Source License.

The project is hosted here. There you can find documentation, code and pre-built Debian packages.

Simulation of miscible binary mixtures based on lattice Boltzmann method (ACM)

Abstract:

Miscible fluid mixtures, like pouring honey into water, Coca Cola into strong wine, are common phenomena in our daily life. While two miscible fluids are mixed together, their appearances in terms of colors and shapes will change due to their mixing interaction. The interaction between the mixture components could be regarded as a combination of the diffusing process and demixing process. If the former dominates the interaction, it is miscible; otherwise, it is immiscible. The complex microscopic interplay between the mixture components makes the simulation highly challenging. So far, there have been some dedicated research in computer graphics dealing with immiscible mixtures, but few works have been done focusing on miscible mixtures. In this paper, for the first time, we introduce a two-fluid lattice Boltzmann method (LBM), called TFLBM, applied to miscible binary mixtures. Different from other similar methods, the viscous and diffusing properties of the fluid in our work are considered separately, so that the physical insight is exposed more clearly and rationally. In addition, the operation of LBM is mostly a linear local computation, and graphics processing unit (GPU) has been utilized to achieve real-time simulation.  


Note: Requires ACMPortal subscription to view in full.

Simulation of atmospheric binary mixtures based on two-fluid model

Abstract:

Atmospheric binary mixtures such as tornado, sandstorm are common natural phenomena in our daily life. There are two fluid systems in these phenomena, which are air flow (wind field) and dust particle flow. Due to the complex mechanism of two fluid systems and the interaction between them, few works have been done on simulating these phenomena. In this paper, for the first time, we have simulated such two fluid phenomena under a unified framework by a Reynolds-average two-fluid model (RATFM) based on the Navier–Stokes equations. In RATFM, the air flow and dust particle flow are simulated accurately by two different Navier–Stokes equations, respectively. The interaction between two fluids is also simulated by introducing an interaction force. Then, a RATFM solver on GPU is designed to achieve fast simulation. In addition, multiple scattering effects of the participating media are considered for realistic rendering. 

 

Note: Requires ScienceDirect subscription to view in full.

Large Simulations of Shear Flow in Mixtures via the Lattice Boltzmann Equation

Abstract

Fluid dynamics presents many computational challenges, particularly in the area of complex fluids, where microscopic/mesoscopic details of the fluid components are important in addition to the bulk properties such as the viscosity. One useful method for studying such systems is based on the lattice Boltzmann equation (LBE) for the incompressible Navier-Stokes equations (for a review see e.g., ref. 1). The LBE provides a natural way for the microscopic details—e.g., composition, liquid crystal ordering, and so on—to be coupled to the fluid flow. In addition, by relaxing the constraint of exact incompressibility the LBE allows the fluid pressure to be computed locally, and thus is extremely well suited to parallel computation. 

 

TeraFLOP computing on a desktop PC with GPUs for 3D CFD

Abstract:

A very efficient implementation of a lattice Boltzmann (LB) kernel in 3D on a graphical processing unit using the compute unified device architecture interface developed by nVIDIA is presented. By exploiting the explicit parallelism offered by the graphics hardware, we obtain an efficiency gain of up to two orders of magnitude with respect to the computational performance of a PC. A non-trivial example shows the performance of the LB implementation, which is based on a D3Q13 model that is described in detail.

Accelerating geoscience and engineering system simulations on graphics hardware (ACM)

Abstract:

Many complex natural systems studied in the geosciences are characterized by simple local-scale interactions that result in complex emergent behavior. Simulations of these systems, often implemented in parallel using standard central processing unit (CPU) clusters, may be better suited to parallel processing environments with large numbers of simple processors. Such an environment is found in graphics processing units (GPUs) on graphics cards.

This paper discusses GPU implementations of three example applications from computational fluid dynamics, seismic wave propagation, and rock magnetism. These candidate applications involve important numerical modeling techniques, widely employed in physical system simulations, that are themselves examples of distinct computing classes identified as fundamental to scientific and engineering computing. The presented numerical methods (and respective computing classes they belong to) are: (1) a lattice-Boltzmann code for geofluid dynamics (structured grid class); (2) a spectral-finite-element code for seismic wave propagation simulations (sparse linear algebra class); and (3) a least-squares minimization code for interpreting magnetic force microscopy data (dense linear algebra class). Significant performance increases (between 10× and 30× in most cases) are seen in all three applications, demonstrating the power of GPU implementations for these types of simulations and, more generally, their associated computing classes. 

Paper available through ACM.

Fluid flow simulation on the Cell Broadband Engine using the lattice Boltzmann method (ACM)

Abstract:

In this paper we present a fast lattice Boltzmann fluid solver that has been performance optimized and tailored for the Cell Broadband Engine Architecture. Many design decisions were motivated by the long range objective to simulate blood flow in human blood vessels, especially in aneurysms, but have proven to be much more generally applicable. After explaining implementation details and how they were influenced by the target platform, the performance and memory requirements of this prototype solver are evaluated.

Paper avaialble through ACM.

Performance and Accuracy of Lattice-Boltzmann Kernels on Multi- and Manycore Architectures

Abstract:  

 

We present different kernels based on Lattice-Boltzmann methods for the solution of the two-dimensional Shallow Water and Navier-Stokes equations on fully structured lattices. The functionality ranges from simple scenarios like open-channel flows with planar beds to simulations with complex scene geometries like solid obstacles and non-planar bed topography with drystates and even interaction of the fluid with floating objects. The kernels are integrated into a hardware-oriented collection of libraries targeting multiple fundamentally different parallel hardware architectures like commodity multicore CPUs, the Cell BE, NVIDIA GPUs and clusters. We provide an algorithmic study which compares the different solvers in terms of performance and numerical accuracy in view of their capabilities and their specific implementation and optimisation on the different architectures. We show that an eightfold speedup over optimised multithreaded CPU code can be obtained with the GPU using basic methods and that even very complex flow phenomena can be simulated with significant speedups without loss of accuracy. 

An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters

Abstract:  

 

Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications tremendously. While multi- GPU workstations with several TeraFLOPS of peak computing power are available to accelerate computational problems, larger problems require even more resources. Conventional clusters of central processing units (CPU) are now being augmented with multiple GPUs in each compute-node to tackle large problems. The heterogeneous architecture of a multi-GPU cluster with a deep memory hierarchy creates unique challenges in developing scalable and efficient simulation codes. In this study, we pursue mixed MPI-CUDA implementations and investigate three strategies to probe the efficiency and scalability of incompressible flow computations on the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA). We exploit some of the advanced features of MPI and CUDA programming to overlap both GPU data transfer and MPI communications with computations on the GPU. We sustain approximately 2.4 TeraFLOPS on the 64 nodes of the NCSA Lincoln Tesla cluster using 128 GPUs with a total of 30,720 processing elements. Our results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics (CFD) simulations. 

Porous Rock Simulations and Lattice Boltzmann on GPUs

Abstract:

 

Investigating how fluids flow inside the complicated geometries of porous rocks is an important problem in the petroleum industry. The lattice Boltzmann method (LBM) can be used to calculate porous rocks’ permeability. In this paper, we show how to implement this method efficiently on modern GPUs. Both a sequential CPU implementation and a parallelized GPU implementation is developed. Both implementations were tested using three porous data sets with known permeabilities. Our work shows that it is possible to calculate the permeability of porous rocks of simulations sizes up to 3683, which fit into the 4 GB memory of the NVIDIA Quadro FX 5800 card. Our single floating-point precision simulation resulted in respectbale 0.95-1.59 MLUPS whereas our GPU implentation achieved remarkable 180+ MLUPS for several lattices in the 1603 to 3683 range allowing calculations that would take hours on the CPU to be done in minutes on the GPU. Techniques for reducing round-off errors are also discussed and implemented.