cuHMM: A CUDA Implementation of Hidden Markov Model Training and Classification
Hidden Markov model (HMM) as a sequential classifier has important applications in speech and language processing [Rab89] [JM08] and biological sequence analysis [Kro98]. In this project, we analysis the parallelism in the three algorithms for HMM training and classification, i.e. forward algorithm, Viterbi algorithm, and Baum-Welch algorithm, for graphical processing units (GPU). Based on the analysis, we implement a prototype program for HMM training and classification on the NVIDIA CUDA platform. The evaluation shows for forward algorithm, our CUDA implementation achieves performance of 23.3 GFLOP/s and has up to 800x speedup over the implementation based on single core CPU; and for Baum-Welch algorithm, the performance is 4.3 GFLOP/s and there is also 200x speedup over CPU implementation. We also note our implementation is not fully optimized: several parallelism found during analysis is not implemented due to time and complexity. We expect more sophisticated implementation can further exploit the GPU computing ability. The remaining sections are organized as follows. In Section 2, we first give a brief introduction to hidden Markov model, followed by description of the three most important algorithms for HMM and our analysis of their parallelism. In Section 3, we describe our implementation in details. In Section 4, we evaluate the implementation by experiments. The experimental results are shown and discussed. Section 5 is conclusion.