Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Large Vocabulary Mandarin Speech Recognition with Different Approaches in Modeling Tones

Eric Chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee

Microsoft Research China, Beijing, China

Large vocabulary continuous Mandarin speech recognition has been an important problem for speech recognition researchers for several reasons [1], [2]. First of all, it is a tonal language that requires special treatment for the modeling of tones. There are five tones in Mandarin which are necessary to disambiguate between confusable words. Secondly, the difficulty of entering Chinese by keyboard presents a great opportunity for speech recognition to improve computer usability. Previous approaches to modeling tones have included using a separate tone classifier [1] and incorporating pitch directly into the feature vector [2]. In this paper, we describe a large vocabulary Mandarin speech recognition system based on Microsoft’s Whisper system. Several alternatives in modeling tones and their error rates on continuous speech are compared.

The experimental result shows a character error rate of 7.32% on a test set of 50 speakers and 1000 sentences when no special tone processing is performed in the acoustic model. When the final syllable model set is expanded to include tones, the error rate drops to 6.43% (error rate reduction of 12.2%). When pitch information and the larger final syllable set are used in combination, the error rate is 6.03% (cumulative error rate reduction of 17.6%). This result suggests that other sources of information such as energy and duration can also contribute toward disambiguating between different tones.


  1. Lee L. S., et. al, "Golden Mandarin - A Real Time Mandarin Speech Dictation Machine for Chinese Language with Very Large Vocabulary", IEEE Trans. on Speech and Audio Processing, Vol. 1, NO. 2, pp 158-179, April 1993.
  2. Chen C. J., et. al., "New Methods in Continuous Mandarin Recognition", Proc. Eurospeech 97, Volume 3, pages 1543-1546.

Full Paper

Bibliographic reference.  Chang, Eric / Zhou, Jianlai / Di, Shuo / Huang, Chao / Lee, Kai-Fu (2000): "Large vocabulary Mandarin speech recognition with different approaches in modeling tones", In ICSLP-2000, vol.2, 983-986.