INTERSPEECH 2006 - ICSLP
Tone has a crucial role in Mandarin speech in distinguishing ambiguous words. Most state-of-the-art Mandarin automatic speech recognition systems adopt embedded tone modeling, where tonal acoustic units are used and F0 features are appended to the spectral feature vector. In this paper, we combine the embedded approach (using improved F0 smoothing) with explicit tone modeling in rescoring the output lattices. Oracle experiments indicate 32% relative improvement can be achieved by rescoring with perfect tone information. Recognition experiments on Mandarin broadcast news show that, even with an accuracy of only 70%, the explicit tone classifier offers complementary knowledge and improves performance significantly. Through the combination of tone modeling techniques, the character error rate on the CTV test set can be improved from 13.0% to 11.5%.
Bibliographic reference. Lei, Xin / Siu, Manhung / Hwang, Mei-Yuh / Ostendorf, Mari / Lee, Tan (2006): "Improved tone modeling for Mandarin broadcast news speech recognition", In INTERSPEECH-2006, paper 1752-Tue3A2O.4.