9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

GPU-Accelerated Gaussian Clustering for fMPE Discriminative Training

Yu Shi, Frank Seide, Frank K. Soong

Microsoft Research Asia, China

The Graphics Processing Unit (GPU) has extended its applications from its original graphic rendering to more general scientific computation. Through massive parallelization, state-of-the-art GPUs can deliver 200 billion floating-point operations per second (0.2 TFLOPS) on a single consumer-priced graphics card. This paper describes our attempt in leveraging GPUs for efficient HMM model training. We show that using GPUs for a specific example of Gaussian clustering, as required in fMPE, or feature-domain Minimum Phone Error discriminative training, can be highly desirable. The clustering of huge number of Gaussians is very time consuming due to the enormous model size in current LVCSR systems. Comparing an NVidia Geforce 8800 Ultra GPU against an Intel Pentium 4 implementation, we find that our brute-force GPU implementation is 14 times faster overall than a CPU implementation that uses approximate speed-up heuristics. GPU accelerated fMPE reduces the WER 6% relatively, compared to the maximum-likelihood trained baseline on two conversational-speech recognition tasks.

Full Paper

Bibliographic reference.  Shi, Yu / Seide, Frank / Soong, Frank K. (2008): "GPU-accelerated Gaussian clustering for fMPE discriminative training", In INTERSPEECH-2008, 944-947.