The incorporation of prosodic information in large vocabulary continuous speech recognition has attracted much attention in recent years, especially for a tonal language such as Mandarin Chinese. The tones of some syllables are very difficult to recognize correctly due to the very complicated prosodic behavior. Tone recognition errors inevitably degrade the recognition accuracy seriously. We propose a new approach by introducing an extra tone category of "unknown." When the tone is difficult to recognize, the tone information will not be used. A two-stage prosodic model is developed for such a propose, and a 17.8% reduction in character error rate was achieved. Notably, this approach does not require speaker normalization for prosodic features.
Bibliographic reference. Cheng, Li-Wei / Lee, Lin-shan (2008): "Improved large vocabulary Mandarin speech recognition by selectively using tone information with a two-stage prosodic model", In INTERSPEECH-2008, 1137-1140.