11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Automatic Speech Recognition System Channel Modeling

Qun Feng Tan, Kartik Audhkhasi, Panayiotis G. Georgiou, Emil Ettelaie, Shrikanth S. Narayanan

University of Southern California, USA

In this paper, we present a systems approach for channel modeling of an Automatic Speech Recognition (ASR) system. This can have implications in improving speech recognition components, such as through discriminative language modeling. We simulate the ASR corruption using a phrase-based machine translation system trained between the reference phoneme and output phoneme sequences of a real ASR. We demonstrate that local optimization on the quality of phoneme-to-phoneme mappings does not directly translate to overall improvement of the entire model. However, we are still able to capitalize on contextual information of the phonemes which a simple acoustic distance model is not able to accomplish. Hence we show that the use of longer context results in a significantly improved model of the ASR channel.

Full Paper

Bibliographic reference.  Tan, Qun Feng / Audhkhasi, Kartik / Georgiou, Panayiotis G. / Ettelaie, Emil / Narayanan, Shrikanth S. (2010): "Automatic speech recognition system channel modeling", In INTERSPEECH-2010, 2442-2445.