16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Prosodically-Enhanced Recurrent Neural Network Language Models

Siva Reddy Gangireddy (1), Steve Renals (1), Yoshihiko Nankaku (2), Akinobu Lee (2)

(1) University of Edinburgh, UK
(2) Nagoya Institute of Technology, Japan

Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements in perplexity and small improvements in WER.

Full Paper

Bibliographic reference.  Gangireddy, Siva Reddy / Renals, Steve / Nankaku, Yoshihiko / Lee, Akinobu (2015): "Prosodically-enhanced recurrent neural network language models", In INTERSPEECH-2015, 2390-2394.