Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition

Shohei Toyama, Daisuke Saito, Nobuaki Minematsu


In this study, we propose a new method of adapting language models for speech recognition using para-linguistic and extra-linguistic features in speech. When we talk with others, we often change the way of lexical choice and speaking style according to various contextual factors. This fact indicates that the performance of automatic speech recognition can be improved by taking the contextual factors into account, which can be estimated from speech acoustics. In this study, we attempt to find global and acoustic features that are associated with those contextual factors, then integrate those features into Recurrent Neural Network (RNN) language models for speech recognition. In experiments, using Japanese spontaneous speech corpora, we examine how i-vector and openSMILE are associated with contextual factors. Then, we use those features in the reranking process of RNN-based language models. Results show that perplexity is reduced by 16% relative and word error rate is reduced by 2.1% relative for highly emotional speech.


 DOI: 10.21437/Interspeech.2017-717

Cite as: Toyama, S., Saito, D., Minematsu, N. (2017) Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition. Proc. Interspeech 2017, 543-547, DOI: 10.21437/Interspeech.2017-717.


@inproceedings{Toyama2017,
  author={Shohei Toyama and Daisuke Saito and Nobuaki Minematsu},
  title={Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={543--547},
  doi={10.21437/Interspeech.2017-717},
  url={http://dx.doi.org/10.21437/Interspeech.2017-717}
}