Sixth International Conference on Spoken Language Processing
October 16-20, 2000
Large Vocabulary Mandarin Speech Recognition with Different Approaches in Modeling Tones
Eric Chang, Jianlai Zhou, Shuo Di, Chao Huang, Kai-Fu Lee
Microsoft Research China,
Large vocabulary continuous Mandarin speech recognition has
been an important problem for speech recognition researchers
for several reasons , . First of all, it is a tonal language
that requires special treatment for the modeling of tones. There
are five tones in Mandarin which are necessary to disambiguate
between confusable words. Secondly, the difficulty of entering
Chinese by keyboard presents a great opportunity for speech
recognition to improve computer usability. Previous approaches
to modeling tones have included using a separate tone classifier
 and incorporating pitch directly into the feature vector .
In this paper, we describe a large vocabulary Mandarin speech
recognition system based on Microsoft’s Whisper system.
Several alternatives in modeling tones and their error rates on
continuous speech are compared.
The experimental result shows a character error rate of 7.32% on
a test set of 50 speakers and 1000 sentences when no special
tone processing is performed in the acoustic model. When the
final syllable model set is expanded to include tones, the error
rate drops to 6.43% (error rate reduction of 12.2%). When pitch
information and the larger final syllable set are used in
combination, the error rate is 6.03% (cumulative error rate
reduction of 17.6%). This result suggests that other sources of
information such as energy and duration can also contribute
toward disambiguating between different tones.
- Lee L. S., et. al, "Golden Mandarin - A Real Time
Mandarin Speech Dictation Machine for Chinese
Language with Very Large Vocabulary", IEEE Trans.
on Speech and Audio Processing, Vol. 1, NO. 2, pp 158-179, April 1993.
- Chen C. J., et. al., "New Methods in Continuous
Mandarin Recognition", Proc. Eurospeech 97, Volume
3, pages 1543-1546.
Chang, Eric / Zhou, Jianlai / Di, Shuo / Huang, Chao / Lee, Kai-Fu (2000):
"Large vocabulary Mandarin speech recognition with different approaches in modeling tones",
In ICSLP-2000, vol.2, 983-986.