Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements in perplexity and small improvements in WER.
Bibliographic reference. Gangireddy, Siva Reddy / Renals, Steve / Nankaku, Yoshihiko / Lee, Akinobu (2015): "Prosodically-enhanced recurrent neural network language models", In INTERSPEECH-2015, 2390-2394.