INTERSPEECH 2006 - ICSLP
This paper addresses the issues of robust automatic speech recognition (ASR) for accented Mandarin in car environments. A robust front-end is proposed, which adopts a Minimum Mean-Square Error (MMSE) estimator to suppress the background noise in frequency domain, and then implements spectrum smoothing both in time and frequency index to compensate those spectrum components distorted by the noise over-reduction. In the context of Mandarin speech recognition, a special adverse factor is the diversification of Chinese dialects, i.e. the pronunciation difference among dialects decreases the recognition performance if the acoustic models are trained with an unmatched accented database. We propose to train the models with multiple accented Mandarin databases to solve this problem. Evaluation results of isolated phrase recognition show that the proposed front-end can obtained the average error rate reduction (ERR) of 58.3% and 9.7% for artificial car noisy speech and real in-car speech respectively, when compared with the baseline in which no noise compensation technology is used. The efficiency of the proposed model training scheme is also proved in the experiments.
Bibliographic reference. Ding, Pei / He, Lei / Yan, Xiang / Hao, Jie (2006): "Robust automatic speech recognition for accented Mandarin in car environments", In INTERSPEECH-2006, paper 1764-Thu2CaP.6.