ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Two-stream modeling of Mandarin tones

Frank Seide, Nick J.C. Wang

Tone modeling is a critical component for Mandarin large- vocabulary continuous-speech recognition systems. In previ- ous work on pitch-feature extraction, we reported character error rate reductions of over 30% over the non-tonal baseline [1]. In this paper, we investigate how best to integrate tone modeling with a Mandarin LVCSR system.

The paper focusses on the two-stream method, which is based on two-stream continuous-mixture HMMs. For tonal langua- ges, sub-word units may depend on both phonetic context and tone. To alleviate for the multiplication of model pa- rameters, the two-stream method models state emission dis- tributions as products of independent spectral and tonal mix- tures. This allows sub-word units with di erent dependences and independent state tying for the two streams, reducing model size and allowing tone-dependent modeling of initials.

We systematically compared the two-stream method with two other approaches that we named two-model and single- stream. The two-model method yields 5% higher error rates and cannot use one-pass Viterbi decoding, while the single- stream approach requires 30{50% more parameters at similar accuracy.

H. Huang and F. Seide. Pitch tracking and tone features for Mandarin speech recognition. In Proc. ICASSP'2000, Istanbul, 2000.


doi: 10.21437/ICSLP.2000-407

Cite as: Seide, F., Wang, N.J.C. (2000) Two-stream modeling of Mandarin tones. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 867-870, doi: 10.21437/ICSLP.2000-407

@inproceedings{seide00_icslp,
  author={Frank Seide and Nick J.C. Wang},
  title={{Two-stream modeling of Mandarin tones}},
  year=2000,
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 2, 867-870},
  doi={10.21437/ICSLP.2000-407}
}