INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Integrating Stress Information in Large Vocabulary Continuous Speech Recognition

Bogdan Ludusan, Stefan Ziegler, Guillaume Gravier

CNRS-IRISA, Rennes, France

In this paper we propose a novel method for integrating stress information in the decoding step of a speech recognizer. A multiscale rhythm model was used to determine the stress scores for each syllable, which are further used to reinforce paths during search. Two strategies for integrating the stress were employed: the first one reinforces paths through all the syllables with a value proportional to the their stress score, while the second one enhances paths passing only through stressed syllables, but with a constant value. The former strategy slightly outperforms the later, bringing a relative improvement of more than 2% over the baseline. Furthermore, the stress information proved to be a robust feature, by performing well even for foreign-accented speech.

Index Terms: speech recognition, stress, rhythm

Full Paper

Bibliographic reference.  Ludusan, Bogdan / Ziegler, Stefan / Gravier, Guillaume (2012): "Integrating stress information in large vocabulary continuous speech recognition", In INTERSPEECH-2012, 2642-2645.