10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A Study of Bootstrapping with Multiple Acoustic Features for Improved Automatic Speech Recognition

Xiaodong Cui, Jian Xue, Bing Xiang, Bowen Zhou

IBM T.J. Watson Research Center, USA

This paper investigates a scheme of bootstrapping with multiple acoustic features (MFCC, PLP and LPCC) to improve the overall performance of automatic speech recognition. In this scheme, a Gaussian mixture distribution is estimated for each type of feature resampled in each HMM state by single-pass retraining on a shared decision tree. Thus obtained acoustic models based on the multiple features are combined by likelihood averaging during decoding. Experiments on large vocabulary spontaneous speech recognition show its superior overall performance than the best of acoustic models from individual features. It also achieves comparable performance to Recognizer Output Voting Error Reduction (ROVER) with computational advantages.

Full Paper

Bibliographic reference.  Cui, Xiaodong / Xue, Jian / Xiang, Bing / Zhou, Bowen (2009): "A study of bootstrapping with multiple acoustic features for improved automatic speech recognition", In INTERSPEECH-2009, 240-243.