EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Prosody Dependent Speech Recognition with Explicit Duration Modelling at Intonational Phrase Boundaries

K. Chen, S. Borys, Mark Hasegawa-Johnson, J. Cole

University of Illinois at Urbana-Champaign, USA

Does prosody help word recognition? In this paper, we propose a novel probabilistic framework in which word and phoneme are dependent on prosody in a way that improves word recognition. The prosody attribute that we investigate in this study is the lengthening of speech segments in the vicinity of intonational phrase boundaries. Explicit Duration Hidden Markov Model (EDHMM) is implemented to provide an accurate phoneme duration model. This study is conducted on Boston University Radio News Corpus with prosodic boundaries marked using ToBI labelling system. We found that lengthening of the phrase final rhymes can be reliably modelled by EDHMM, which significantly improves the prosody dependent acoustic modelling. Conversely, no systematic duration variation is found at phrase initial position. With prosody dependence implemented in the acoustic model, pronunciation model and language model, both word recognition accuracy and boundary recognition accuracy are improved by 1% over systems without prosody dependence.

Full Paper

Bibliographic reference.  Chen, K. / Borys, S. / Hasegawa-Johnson, Mark / Cole, J. (2003): "Prosody dependent speech recognition with explicit duration modelling at intonational phrase boundaries", In EUROSPEECH-2003, 393-396.