Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Two-Stream Modeling of Mandarin Tones

Frank Seide, Nick J.C. Wang

Philips Research East-Asia, Taipei, Taiwan

Tone modeling is a critical component for Mandarin large- vocabulary continuous-speech recognition systems. In previ- ous work on pitch-feature extraction, we reported character error rate reductions of over 30% over the non-tonal baseline [1]. In this paper, we investigate how best to integrate tone modeling with a Mandarin LVCSR system.

The paper focusses on the two-stream method, which is based on two-stream continuous-mixture HMMs. For tonal langua- ges, sub-word units may depend on both phonetic context and tone. To alleviate for the multiplication of model pa- rameters, the two-stream method models state emission dis- tributions as products of independent spectral and tonal mix- tures. This allows sub-word units with di erent dependences and independent state tying for the two streams, reducing model size and allowing tone-dependent modeling of initials.

We systematically compared the two-stream method with two other approaches that we named two-model and single- stream. The two-model method yields 5% higher error rates and cannot use one-pass Viterbi decoding, while the single- stream approach requires 30{50% more parameters at similar accuracy.


  1. H. Huang and F. Seide. Pitch tracking and tone features for Mandarin speech recognition. In Proc. ICASSP'2000, Istanbul, 2000.

Full Paper

Bibliographic reference.  Seide, Frank / Wang, Nick J.C. (2000): "Two-stream modeling of Mandarin tones", In ICSLP-2000, vol.2, 867-870.