Tone modeling is a critical component for Mandarin large- vocabulary continuous-speech recognition systems. In previ- ous work on pitch-feature extraction, we reported character error rate reductions of over 30% over the non-tonal baseline [1]. In this paper, we investigate how best to integrate tone modeling with a Mandarin LVCSR system.
The paper focusses on the two-stream method, which is based on two-stream continuous-mixture HMMs. For tonal langua- ges, sub-word units may depend on both phonetic context and tone. To alleviate for the multiplication of model pa- rameters, the two-stream method models state emission dis- tributions as products of independent spectral and tonal mix- tures. This allows sub-word units with dierent dependences and independent state tying for the two streams, reducing model size and allowing tone-dependent modeling of initials.
We systematically compared the two-stream method with two other approaches that we named two-model and single- stream. The two-model method yields 5% higher error rates and cannot use one-pass Viterbi decoding, while the single- stream approach requires 30{50% more parameters at similar accuracy.
H. Huang and F. Seide. Pitch tracking and tone features for Mandarin speech recognition. In Proc. ICASSP'2000, Istanbul, 2000.
Cite as: Seide, F., Wang, N.J.C. (2000) Two-stream modeling of Mandarin tones. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 867-870, doi: 10.21437/ICSLP.2000-407
@inproceedings{seide00_icslp, author={Frank Seide and Nick J.C. Wang}, title={{Two-stream modeling of Mandarin tones}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 2, 867-870}, doi={10.21437/ICSLP.2000-407} }