7th International Conference on Spoken Language Processing
September 16-20, 2002
Automatic speech recognition in telecommunications environment still has a lower correct rate compared to its desktop pairs. Improving the performance of telephone-quality speech recognition is an urgent problem for its application in those practical fields. Previous works have shown that the main reason for this performance degradation is the variational mismatch caused by different telephone channels between the testing and training sets. In this paper, we propose an efficient implementation to dynamically compensate this mismatch. This algorithm bases on maximum-likelihood (ML) estimation of telephone channels and dynamically follows the timevariations within the channels. It could deal with both linear channelsí (like fixed telephone lines) degradation and some noisy nonlinear channelsí (like some long distance lines and wireless circuit lines, such as GSM) degradation. In our experiments on Mandarin large vocabulary continuous speech recognition (LVCSR) over telephone lines, the average character error rate (CER) decreases more than 20% when applying this algorithm. At the same time, the structural delay and computational consumptions required by this algorithm are limited. The average delay is about 300~400ms. So it could be embedded into practical telephone-based applications.
Bibliographic reference. Zhang, Huayun / Han, Zhaobing / Xu, Bo (2002): "Codebook dependent dynamic channel estimation for Mandarin speech recognition over telephone", In ICSLP-2002, 2197-2200.