13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

A Study on Using Word-Level HMMs to Improve ASR Performance over State-of-the-Art Phone-Level Acoustic Modeling for LVCSR

I-Fan Chen, Chin-Hui Lee

School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA

In this paper, we propose word-level hidden Markov models (HMMs) to supplement state-of-the-art phone-based acoustic modeling in order to enhance the performance of automatic speech recognition (ASR) system. Each word in a vocabulary is initially modeled by well-trained triphone models. Maximum a posteriori adaptation is then applied to generate models for words with a large number of occurrences in the training set so that the acoustic distribution of the words can be modeled more precisely. Experimental results show that the proposed word-based systems outperform phone-based systems on the TIMIT task with a small training corpus. While in tasks with plenty of training data, word-based systems still show improvements over phone-based systems, such as the WSJ task. Furthermore the word-based systems have a better discriminating ability on short words and homophones. They are also more robust to language model weight variation than conventional phone-based systems.

Index Terms: word-level HMM, automatic speech recognition, detection-based ASR, language model weight, homophone

Full Paper

Bibliographic reference.  Chen, I-Fan / Lee, Chin-Hui (2012): "A study on using word-level HMMs to improve ASR performance over state-of-the-art phone-level acoustic modeling for LVCSR", In INTERSPEECH-2012, 1788-1791.