EUROSPEECH 2003 - INTERSPEECH 2003
The degradation of speech recognition performance in real-life environments and through transmission channels is a main embarrassment for many speech-based applications around the world, especially when non-stationary noise and changing channel exist. In this paper, we extend our previous works on Maximum-Likelihood (ML) dynamic channel compensation by introducing a phone-conditioned prior statistic model for the channel bias and applying Maximum A Posteriori (MAP) estimation technique. Compared to the ML based method, the new MAP based algorithm follows with the variations within channels more effectively. The average structural delay of the algorithm is decreased from 400ms to 200 ms, which means it works better for short utterance compensation (as in many real applications). An additional 7~8% character-error-rate relative reduction is observed in telephone-based Mandarin large vocabulary continuous speech recognition (LVCSR). In short utterance test, the word-error-rate relatively reduced 30%.
Bibliographic reference. Zhang, Huayun / Han, Zhaobing / Xu, Bo (2003): "Dynamic channel compensation based on maximum a posteriori estimation", In EUROSPEECH-2003, 2137-2140.