INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Improved Large Vocabulary Mandarin Speech Recognition by Selectively Using Tone Information with a Two-Stage Prosodic Model

Li-Wei Cheng, Lin-shan Lee

National Taiwan University, Taiwan

The incorporation of prosodic information in large vocabulary continuous speech recognition has attracted much attention in recent years, especially for a tonal language such as Mandarin Chinese. The tones of some syllables are very difficult to recognize correctly due to the very complicated prosodic behavior. Tone recognition errors inevitably degrade the recognition accuracy seriously. We propose a new approach by introducing an extra tone category of "unknown." When the tone is difficult to recognize, the tone information will not be used. A two-stage prosodic model is developed for such a propose, and a 17.8% reduction in character error rate was achieved. Notably, this approach does not require speaker normalization for prosodic features.

Full Paper

Bibliographic reference.  Cheng, Li-Wei / Lee, Lin-shan (2008): "Improved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model", In INTERSPEECH-2008, 1137-1140.