In this paper we propose a novel method for integrating stress information in the decoding step of a speech recognizer. A multiscale rhythm model was used to determine the stress scores for each syllable, which are further used to reinforce paths during search. Two strategies for integrating the stress were employed: the first one reinforces paths through all the syllables with a value proportional to the their stress score, while the second one enhances paths passing only through stressed syllables, but with a constant value. The former strategy slightly outperforms the later, bringing a relative improvement of more than 2% over the baseline. Furthermore, the stress information proved to be a robust feature, by performing well even for foreign-accented speech.
Index Terms: speech recognition, stress, rhythm
Bibliographic reference. Ludusan, Bogdan / Ziegler, Stefan / Gravier, Guillaume (2012): "Integrating stress information in large vocabulary continuous speech recognition", In INTERSPEECH-2012, 2642-2645.