Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Improved Tone Recognition by Normalizing for Coarticulation and Intonation Effects

Chao Wang, Stephanie Seneff

Spoken Language Systems Group, Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA

We have previously demonstrated that tone modeling improved speech recognition on a digit corpus [1]. In this work, we further improve tone recognition by normalizing for both tone coarticulation and intonation effects. The tone classification errors on continuous digit strings were reduced by 26.1% from the baseline, when the effects of F0 downdrift, phrase boundary and tone coarticulation were normalized. We also applied the same approach to conversational speech from the YINHE domain [2], and obtained similar improvements. The word error rate on spontaneous YINHE data was reduced by 16.5% when a simple fourtone model was applied to resort recognizer 10-best outputs.


  1. C. Wang and S. Seneff, "A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition," in Proc. ICSLP’98, Sydney, Australia, pp. 635-638, 1998.
  2. C. Wang, J. Glass, H. Meng, J. Polifroni, S. Seneff and V. Zue, "Yinhe: A Mandarin Chinese version of the Galaxy system," in Eurospeech’ 97, Rhodes, Greece, pp. 351-354, 1997.

Full Paper

Bibliographic reference.  Wang, Chao / Seneff, Stephanie (2000): "Improved tone recognition by normalizing for coarticulation and intonation effects", In ICSLP-2000, vol.2, 83-86.