Stories, Papers, WIKIs

Title Body
Initial List of GPU Bioinformatics Papers

Smith Waterman

  • Y. Liu, D. Maskell, B. Schmidt: "CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units", BMC Research Notes, 2:73, 2009
  • L. Ligowski, W. Rudnicki. An efficient implementation of Smith-Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases. In IEEE International Workshop on High Performance Computational Biology (HiCOMB 2009), 2009.
  • S. Manavski, G. Valle. CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment. BMC Bioinformatics, 9 Suppl 2:S10, 2008
  • W. Liu, B. Schmidt, G. Voss, A. Schroeder, W. Mueller-Wittig: "Bio-Sequence Database Scanning on a GPU", 5th IEEE International Workshop on High Performance Computational Biology (HiCOMB 2006), Rhode Island, Greece, IEEE Press, 2006

Multiple Seqeunce Alignment

  • Y. Liu, B. Schmidt, D.L. Maskell: MSA-CUDA: Multiple Sequence Alignment on Graphics Processing Units with CUDA, 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2009), Boston, MA, IEEE Press

HMMer

  • J. P. Walters, V. Balu, S. Kompalli, and V. Chaudhary, “ Evaluating the use of GPUs in Liver Image Segmentation and HMMER Database Searches”,IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS), May 25-29, 2009, Rome, Italy.

Next-generation Sequencing

  • MC. Schatz, C. Trapnell, AL Delcher, A Varshney: "High-throughput sequence alignment using graphics processing units". BMC Bioinformatics, 8:474, 2007.
  • MC Schatz, C. Trapnell. "Optimizing data intensive GPGPU computations for DNA sequence alignment" Parallel Computing, vol 35,pages 429:440, 2009
  • H. Shi, B. Schmidt, W. Liu, K. Mueller-Wittig: "Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA", 8th IEEE International Workshop on High Performance Computational Biology (HiCOMB 2009), Rome, Italy, IEEE Press

Phylogeny Reconstruction


  • M.A. Suchard and A. Rambaut. Many-core algorithms for statistical phylogenetics. Bioinformatics, 2009, 25(11): 1370-1376

  • M. Charalambous, P. Trancoso, A. Stamatakis. Initial experiences porting a bioinformatics application to a graphics processor. Advances in Informatics, pages 415-425, 2005

RNA

  • G.Rizk , D. Lavenier: "GPU accelerated RNA folding algorithm". LNCS 5544, Baton Rouge, Louisiana, U.S.A., 2009
  • G. Rizk, D. Lavenier, "GPU accelerated Rna-Rna interaction algorithm", EMB Conference 2008: Leading Applications and Technologies in bioinformatics, Martina Franka (Taranto), Italy, 2008

Motif finding

  • Y. Liu, B. Schmidt, W. Liu, D. Maskell: "CUDA-MEME: Accelerating Motif Discovery in Biological Sequences Using CUDA-enabled Graphics Processing Units", Pattern Recognition Letters, in press, doi:10.1016/j.patrec.2009.10.009

Position Weight Matrices

  • M. Giraud, JS. Varré. "Parallel position weight matrices algorithms". In International Symposium on Parallel and Distributed Computing (ISPDC 2009), 2009
Biomanycores

 Biomaycores (http://www.biomanycores.org/) provides interfaces in Java, Perl, and Python to a number of open-source parallel bioinformatics code in CUDA/OpenCL. The intention of the Biomanycores project is to bridge the gap between research in HPC and platforms of usual bioinformaticians and biologists, in particular by giving accesses to high-performance prototypes through Bio* frameworks.


 

Resources on NVIDIA's website

Check out the Tesla Bio Workbench for some publicly available GPU tools in Bioinformatics and in Molecular Dynamics. A list of links to related papers can be found at http://www.nvidia.com/object/bio_info_life_sciences.html.

Massive Threading: Using GPUs to Increase the Performance of Digital Forensics Tools

Abstract:

The current generation of Graphics Processing Units (GPUs) contain a large number of general purpose processors, in sharp contrast to previous generation designs, where special-purpose hardware units (such as texture and vertex shaders) were commonly used. This fact, combined with the prevalence of multicore general purpose CPUs in modern workstations, suggests that performance-critical software such as digital forensics tools be “massively” threaded to take advantage of all available computational resources.
Several trends in digital forensics make the availability of more processing power very important. These trends include a large increase in the average size (measured in bytes) of forensic targets, an increase in the number of digital forensics cases, and the development of “next generation” tools that require more computational resources. This paper presents the results of a number of experiments that evaluate the effectiveness of offloading processing common to digital forensics tools to a GPU, using “massive” numbers of threads to parallelize the computation. These results are compared to speedups obtainable by simple threading schemes appropriate for multicore CPUS. Our results indicate that in many cases, the use of GPUs can substantially increase the performance of digital forensics tools.

High-Throughput Sequence Alignment Using Graphics Processing Units

Background
The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies.
Results
This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies.
Conclusion

MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. 

GPU Computing for systems Biology

Note: Requires a subscription to Briefings in Bioinformatics Online to view.

 

The development of detailed, coherent, models of complex biological systems is recognized as a key requirement for integrating the increasing amount of experimental data. In addition, in-silico simulation of bio-chemical models provides an easy way to test different experimental conditions, helping in the discovery of the dynamics that regulate biological systems. However, the computational power required by these simulations often exceeds that available on common desktop computers and thus expensive high performance computing solutions are required. An emerging alternative is represented by general-purpose scientific computing on graphics processing units (GPGPU), which offers the power of a small computer cluster at a cost of ~$400. Computing with a GPU requires the development of specific algorithms, since the programming paradigm substantially differs from traditional CPU-based computing. In this paper, we review some recent efforts in exploiting the processing power of GPUs for the simulation of biological systems.

GPU Acceleration of Iterative Digital Breast Tomosynthesis with Error Checking

   Digital Breast Tomosynthesis (DBT) is a technology that mitigates many of the shortcomings associated with traditional mammography. Using multiple low-dose x-ray projections with an iterative maximum likelihood estimation method, DBT is able to create a high-quality, three-dimensional reconstruction of the breast. However, the tenability of DBT depends largely on the potential for decreasing the execution time to be acceptable within a clinical setting.

   In this work we accelerate our DBT algorithm on the latest generation of NVIDIA’s CUDA-enabled GPUs, reducing the execution time to under 20 seconds for eight iterations (the amount usually required to obtain a clean reconstruction). Moreover, with the execution time substantially decreased, a large number of additional benefits can be achieved, such as using redundant computations to prevent inaccuracies or artifacts that can be introduced from transient faults or other memory errors during execution. We also supply the highlevel algorithms and thread-mapping strategy (for both the CPU and GPUs) for creating a multiple-GPU version of the
algorithm, and discuss how the choices play to the strengths of the GPU architecture. 

Speeding Up Evolutionary Learning Algorithms using GPUs

Abstract:  

 

This paper propose a multithreaded Genetic Programming classi fication evaluation model using NVIDIA CUDA GPUs to reduce the computational time due to the poor performance in large problems. Two di fferent classifi cation algorithms are benchmarked using UCI Machine Learning data sets. Experimental results compare the performance using single and multithreaded Java, C and GPU code and show the efficiency far better obtained by our proposal. 

CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units
 
 
Background
The Smith-Waterman algorithm is one of the most widely used tools for searching biological sequence databases due to its high sensitivity. Unfortunately, the Smith-Waterman algorithm is computationally demanding, which is further compounded by the exponential growth of sequence databases. The recent emergence of many-core architectures, and their associated programming interfaces, provides an opportunity to accelerate sequence database searches using commonly available and inexpensive hardware.
 
 
Findings
Our CUDASW++ implementation (benchmarked on a single-GPU NVIDIA GeForce GTX 280 graphics card and a dual-GPU GeForce GTX 295 graphics card) provides a significant performance improvement compared to other publicly available implementations, such as SWPS3, CBESW, SW-CUDA, and NCBI-BLAST. CUDASW++ supports query sequences of length up to 59K and for query sequences ranging in length from 144 to 5,478 in Swiss-Prot release 56.6, the single-GPU version achieves an average performance of 9.509 GCUPS with a lowest performance of 9.039 GCUPS and a highest performance of 9.660 GCUPS, and the dual-GPU version achieves an average performance of 14.484 GCUPS with a lowest performance of 10.660 GCUPS and a highest performance of 16.087 GCUPS.
 
 
Conclusion
CUDASW++ is publicly available open-source software. It provides a significant performance improvement for Smith-Waterman-based protein sequence database searches by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs.

 

An efficient implementation of Smith-Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases

The Smith Waterman algorithm for sequencealignment is one of the main tools of bioinformatics. It is usedfor sequence similarity searches and alignment of similarsequences. The high end Graphical Processing Unit (GPU),used for processing graphics on desktop computers, delivercomputational capabilities exceeding those of CPUs by anorder of magnitude. Recently these capabilities becameaccessible for general purpose computations thanks to CUDAprogramming environment on Nvidia GPUs and ATI StreamComputing environment on ATI GPUs. Here we present anefficient implementation of the Smith Waterman algorithm onthe Nvidia GPU. The algorithm achieves more than 3.5 timeshigher per core performance than previously publishedimplementation of the Smith Waterman algorithm on GPU,reaching more than 70% of theoretical hardware performance.The differences between current and earlier approaches aredescribed showing the example for writing efficient code onGPU.