12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Template-Based Automatic Speech Recognition Meets Prosody

Dino Seppi, Kris Demuynck, Dirk Van Compernolle

Katholieke Universiteit Leuven, Belgium

In this paper, we use prosodic information to improve the accuracy of our template-based automatic speech recognizer. Prosodic information is harvested adopting a data-driven approach. A number of prosodic features is extracted, then combined into major groups, and finally studied separately and together. All acoustic evidence, both segmental and suprasegmental, is modelled non-parametrically. The different sources of information are conveniently combined with segmental conditional random fields. Prosody enhances the accuracy of the state-of-the-art baseline by reducing the word error rate by 7% relative on the nov92, 20k trigram, Wall Street Journal task.

Full Paper

Bibliographic reference.  Seppi, Dino / Demuynck, Kris / Compernolle, Dirk Van (2011): "Template-based automatic speech recognition meets prosody", In INTERSPEECH-2011, 545-548.