16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Pronunciation and Silence Probability Modeling for ASR

Guoguo Chen, Hainan Xu, Minhua Wu, Daniel Povey, Sanjeev Khudanpur

Johns Hopkins University, USA

In this paper we evaluate the WER improvement from modeling pronunciation probabilities and word-specific silence probabilities in speech recognition. We do this in the context of Finite State Transducer (FST)-based decoding, where pronunciation and silence probabilities are encoded in the lexicon (L) transducer. We describe a novel way to model word-dependent silence probabilities, where in addition to modeling the probability of silence following each individual word, we also model the probability of each word appearing after silence. All of these probabilities are estimated from aligned training data, with suitable smoothing. We conduct our experiments on four commonly used automatic speech recognition datasets, namelyWall Street Journal, Switchboard, TED-LIUM, and Librispeech. The improvement from modeling pronunciation and silence probabilities is small but fairly consistent across datasets.

Full Paper

Bibliographic reference.  Chen, Guoguo / Xu, Hainan / Wu, Minhua / Povey, Daniel / Khudanpur, Sanjeev (2015): "Pronunciation and silence probability modeling for ASR", In INTERSPEECH-2015, 533-537.