11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Unsupervised Model Adaptation on Targeted Speech Segments for LVCSR System Combination

Richard Dufour, Fethi Bougares, Yannick Estève, Paul Deléglise

LIUM- University of Le Mans, France

In context of Large-Vocabulary Continuous Speech Recognition, systems can reach a high level of performance when dealing with prepared speech, while their performance drops on spontaneous speech. This decrease is due to the fact that these two kind of speech are marked by strong acoustic and linguistic differences. Previous research works had been done to detect and repair some peculiarities of spontaneous speech, as disfluencies, and to create specific models to improve recognition accuracy: a large amount of data is needed to see improvements and is expensive to collect. In this paper, we present a solution to create specialized acoustic and language models, by automatically extracting a data subset from the initial training corpus containing spontaneous speech, and adapting initial acoustic and linguistic models on it. As we assume these models can be complementary, we propose to combine general and adapted ASR system outputs. Experimental results show statistically significant gain, for a negligible cost (no additional training data and no human intervention).

Full Paper

Bibliographic reference.  Dufour, Richard / Bougares, Fethi / Estève, Yannick / Deléglise, Paul (2010): "Unsupervised model adaptation on targeted speech segments for LVCSR system combination", In INTERSPEECH-2010, 885-888.