Stories, Papers, WIKIs

Title Body
Initial List of GPU Molecular Modeling Papers
  • Probing Biomolecular Machines with Graphics Processors. James C. Phillips, John E. Stone. Communications of the ACM 52(10):34-41, 2009.
  • GPU Clusters for High Performance Computing. Volodymyr Kindratenko, Jeremy Enos, Guochun Shi, Michael Showerman, Galen Arnold, John E. Stone, James Phillips, Wen-mei Hwu. In Workshop on Parallel Programming on Accelerator Clusters (PPAC), IEEE Cluster 2009. In press.
  • Friedrichs, M.S., Eastman, P., Vaidyanathan, V., Houston, M., Legrand, S., Beberg, A.L., Ensign, D.L., Bruns, C.M., and Pande, V.S. Accelerating molecular dynamic simulation on graphics processing units. Journal of Computational Chemistry 30, 6 (2009), 864–872.
  • Long time-scale simulations of in vivo diffusion using GPU hardware. Elijah Roberts, John E. Stone, Leonardo Sepulveda, Wen-mei W. Hwu, and Zaida Luthey-Schulten. In IPDPS '09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1-8, 2009.
  • High performance computation and interactive display of molecular orbitals on GPUs and multi-core CPUs. John E. Stone, Jan Saam, David J. Hardy, Kirby L. Vandivort, Wen-mei W. Hwu, and Klaus Schulten. In Proceedings of the 2nd Workshop on General-Purpose Processing on Graphics Processing Units, ACM International Conference Proceeding Series, volume 383, pp. 9-18, 2009.
  • Multilevel summation of electrostatic potentials using graphics processing units. David J. Hardy, John E. Stone, and Klaus Schulten. Journal of Parallel Computing, 35:164-177, 2009.
  • Dynerman, D., Butzlaff, E., and Mitchell, J.C. CUSA and CUDE: GPU-accelerated methods for estimating solvent accessible surface area and desolvation. Journal of Computational Biology 16, 4 (2009), 523–537.
  • Ufimtsev, I.S. and Martinez, T.J. Quantum chemistry on graphical processing units. Strategies for two-electron integral evaluation. Journal of Chemical Theory and Computation 4, 2 (2008), 222–231.
  • Adapting a message-driven parallel application to GPU-accelerated clusters. James C. Phillips, John E. Stone, and Klaus Schulten. In SC '08: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Piscataway, NJ, USA, 2008. IEEE Press.
  • GPU acceleration of cutoff pair potentials for molecular modeling applications. Christopher I. Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, and Wen-mei W. Hwu. In CF'08: Proceedings of the 2008 conference on Computing Frontiers, pp. 273-282, New York, NY, USA, 2008. ACM.
  • GPU computing. John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and James C. Phillips. Proceedings of the IEEE, 96:879-899, 2008.
  • Anderson, J.A., Lorenz, C.D., and Travesset, A. General-purpose molecular dynamics simulations fully implemented on graphics processing units. Journal of Chemical Physics 227, 10 (2008), 5342–5359.
  • Elsen, E., Vishal, V., Houston, M., Pande, V., Hanrahan, P., and Darve, E. N-body simulations on GPUs. Technical Report, Stanford University (June 2007); http://arxiv.org/abs/0706.3060.
  • Continuous fluorescence microphotolysis and correlation spectroscopy using 4Pi microscopy. Anton Arkhipov, Jana Hüve, Martin Kahms, Reiner Peters, and Klaus Schulten. Biophysical Journal, 93:4006-4017, 2007.
  • Accelerating molecular modeling applications with graphics processors. John E. Stone, James C. Phillips, Peter L. Freddolino, David J. Hardy, Leonardo G. Trabuco, and Klaus Schulten. Journal of Computational Chemistry, 28:2618-2640, 2007.

 

Introducing GMAC

We are proud to announce the first public version of GMAC.

GMAC is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model builds a global memory space that allows CPU code to transparently access data hosted in accelerators' (GPUs) memories. Moreover, the coherency of the data is automatically handled by the library. This removes the necessity for manual memory transfers (cudaMemcpy) between the host and GPU memories.

GMAC is being developed by the Operating System Group at the Universitat Politecnica de Catalunya and the IMPACT Research Group at the Univeristy of Illinois under the University of Illinois/NCSA Open Source License.

The project is hosted here. There you can find documentation, code and pre-built Debian packages.

High Performance Computation and Interactive Display of Molecular Orbitals on GPUs and Multi-core CPUs

Abstract:

The visualization of molecular orbitals (MOs) is important for analyzing the results of quantum chemistry simulations. The functions describing the MOs are computed on a threedimensional lattice, and the resulting data can then be used for plotting isocontours or isosurfaces for visualization as well as for other types of analyses. Existing software packages that render MOs perform calculations on the CPU and require runtimes of tens to hundreds of seconds depending on the complexity of the molecular system. We present novel data-parallel algorithms for computing lattices of MOs on modern graphics processing units (GPUs) and multi-core CPUs. The fastest GPU algorithm achieves up to a 125-fold speedup over an optimized CPU implementation running on one CPU core. We also demonstrate possible bene ts of dynamic GPU kernel generation and just-intime compilation for MO calculation. We have implemented these algorithms within the popular molecular visualization program VMD, which can now produce high quality MO renderings for large systems in less than a second, and achieves the rst-ever interactive animations of quantum chemistry simulation trajectories using only on-the-fly calculation.

Slides from GPGPU-2

GPU Acceleration of a Production Molecular Docking Code

Abstract:

Modeling the interactions of biological molecules, or docking, is critical to both understanding basic life processes and to designing new drugs. Here we describe the GPU-based acceleration of a recently developed, complex, production docking code. We show how the various functions can be mapped to the GPU and present numerous optimizations. We find which parts of the problem domain are best suited to the different correlation methods. The GPU-accelerated system achieves a speedup of at least 17.7x with respect to a single core and 6.1x with respect to four cores for all likely problems sizes. This makes it competitive with FPGA-based systems for small molecule docking, and superior for proteinprotein docking.

Slides from GPGPU-2

GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model

Abstract:

The compute unified device architecture (CUDA) is a programming approach for performing scientific calculations on a graphics processing unit (GPU) as a data-parallel computing device. The programming interface allows to implement algorithms using extensions to standard C language. With continuously increased number of cores in combination with a high memory bandwidth, a recent GPU offers incredible resources for general purpose computing. First, we apply this new technology to Monte Carlo simulations of the two dimensional ferromagnetic square lattice Ising model. By implementing a variant of the checkerboard algorithm, results are obtained up to 60 times faster on the GPU than on a current CPU core. An implementation of the three dimensional ferromagnetic cubic lattice Ising model on a GPU is able to generate results up to 35 times faster than on a current CPU core. As proof of concept we calculate the critical temperature of the 2D and 3D Ising model using finite size scaling techniques. Theoretical results for the 2D Ising model and previous simulation results for the 3D Ising model can be reproduced.

Accelerating geoscience and engineering system simulations on graphics hardware (ACM)

Abstract:

Many complex natural systems studied in the geosciences are characterized by simple local-scale interactions that result in complex emergent behavior. Simulations of these systems, often implemented in parallel using standard central processing unit (CPU) clusters, may be better suited to parallel processing environments with large numbers of simple processors. Such an environment is found in graphics processing units (GPUs) on graphics cards.

This paper discusses GPU implementations of three example applications from computational fluid dynamics, seismic wave propagation, and rock magnetism. These candidate applications involve important numerical modeling techniques, widely employed in physical system simulations, that are themselves examples of distinct computing classes identified as fundamental to scientific and engineering computing. The presented numerical methods (and respective computing classes they belong to) are: (1) a lattice-Boltzmann code for geofluid dynamics (structured grid class); (2) a spectral-finite-element code for seismic wave propagation simulations (sparse linear algebra class); and (3) a least-squares minimization code for interpreting magnetic force microscopy data (dense linear algebra class). Significant performance increases (between 10× and 30× in most cases) are seen in all three applications, demonstrating the power of GPU implementations for these types of simulations and, more generally, their associated computing classes. 

Paper available through ACM.

Multi-scale HPC system for multi-scale discrete simulation—Development and application of a supercomputer with 1 Petaflops peak performance in single precision

Abstract:

A supercomputer with 1.0 Petaflops peak performance in single precision, designed and established by Institute of Process Engineering, Chinese Academy of Sciences, is introduced in this brief communication. A designing philosophy utilizing the similarity between hardware, software and the problems to be solved is embodied, based on the multi-scale method and discrete simulation approaches developed at Institute of Process Engineering (IPE) and implemented in a graphic processing unit (GPU)-based hybrid computing mode. The preliminary applications of this machine in areas of multi-phase flow, molecular dynamics and so on are reported, demonstrating the supercomputer as a paradigm of green computation in new architecture. 

Accelerating Molecular Dynamic Simulation on Graphics Processing Units

 Abstract

 

We describe a complete implementation of all-atom protein molecular dynamics running entirely on a

graphics processing unit (GPU), including all standard force field terms, integration, constraints, and implicit solvent.

We discuss the design of our algorithms and important optimizations needed to fully take advantage of a GPU. We

evaluate its performance, and show that it can be more than 700 times faster than a conventional implementation

running on a single CPU core.

CUSA and CUDE: GPU-Accelerated Methods for Estimating Solvent Accessible Surface Area and Desolvation

Abstract:
It is well-established that a linear correlation exists between accessible surface areas and experimentally measured solvation energies. Combining this knowledge with an analytic formula for calculation of solvent accessible surfaces, we derive a simple model of desolvation energy as a differentiable function of atomic positions. Additionally, we find that this algorithm is particularly well suited for hardware acceleration on graphics processing units (GPUs), outperforming the CPU by up to two orders of magnitude. We explore the scaling of this desolvation algorithm and provide implementation details applicable to general pairwise algorithms.

Quantum Chemistry on Graphical Processing Units. Strategies for Two-Electron Integral Evaluation

Abstract:

Modern videogames place increasing demands on the computational and graphical hardware, leading to novel architectures that have great potential in the context of high performance computing and molecular simulation. We demonstrate that Graphical Processing Units (GPUs) can be used very efficiently to calculate two-electron repulsion integrals over Gaussian basis functionsthe first step in most quantum chemistry calculations. A benchmark test performed for the evaluation of approximately 106 (ss|ss) integrals over contracted s-orbitals showed that a naïve algorithm implemented on the GPU achieves up to 130-fold speedup over a traditional CPU implementation on an AMD Opteron. Subsequent calculations of the Coulomb operator for a 256-atom DNA strand show that the GPU advantage is maintained for basis sets including higher angular momentum functions.

Featured Events