INTERSPEECH 2011
12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Modeling Broad Context for Tone Recognition with Conditional Random Fields

Siwei Wang (1), Gina-Anne Levow (2)

(1) University of Chicago, USA
(2) University of Washington, USA

We propose a tone recognition approach that employs linear-chain Conditional Random Fields (CRF) to model tone variation due to intonation effects. We implement three linear-chain CRFs which aim at modeling intonation effects at phrase-sentence- and story-level boundaries, where we show that standard recognition techniques degrade and common normalization approaches do not improve. We show that all linear-chain CRFs outperform the baseline unigram model, and the biggest improvement is found in recognizing 3rd tones, (4%) in overall accuracy. In particular, Phrase Bigram CRFs show a drastic 39% improvement in recognizing 3rd tones located at initial boundaries. This improvement shows that the position specific modeling of initial tones in bigram CRFs captures the intonation effects better than the baseline unigram model.

Full Paper

Bibliographic reference.  Wang, Siwei / Levow, Gina-Anne (2011): "Modeling broad context for tone recognition with conditional random fields", In INTERSPEECH-2011, 2289-2292.