Online Incremental Learning for Speaker-Adaptive Language Models

Chih Chi Hu, Bing Liu, John Shen, Ian Lane


In this study, we present a computational framework to participate in the Self-Assessed Affect Sub-Challenge in the INTERSPEECH 2018 Computation Paralinguistics Challenge. The goal of this sub-challenge is to classify the valence scores given by the speaker themselves into three different levels, i.e., low, medium and high. We explore fusion of Bi-directional LSTM with baseline SVM models to improve the recognition accuracy. In specifics, we extract frame-level acoustic LLDs as input to the BLSTM with a modified attention mechanism and separate SVMs are trained using the standard ComParE_16 baseline feature sets with minority class upsampling. These diverse prediction results are then further fused using a decision-level score fusion scheme to integrate all of the developed models. Our proposed approach achieves a 62.94% and 67.04% unweighted average recall (UAR), which is an 6.24% and 1.04% absolute improvement over the best baseline provided by the challenge organizer. We further provide a detailed comparison analyVoice control is a prominent interaction method on personal computing devices. While automatic speech recognition (ASR) systems are readily applicable for large audiences, there is room for further adaptation at the edge, ie. locally on devices, targeted for individual users. In this work, we explore improving ASR systems over time through a user's own interactions. Our online learning approach for speaker-adaptive language modeling leverages a user's most recent utterances to enhance the speaker dependent features and traits. We experiment with the Large-Vocabulary Continuous Speech Recognition corpus Tedlium v2 and demonstrate an average reduction in perplexity (PPL) of 19.18% and average relative reduction in word error rate (WER) of 2.80% compared to a state-of-the-art baseline on Tedlium v2.sis between different models.


 DOI: 10.21437/Interspeech.2018-2259

Cite as: Hu, C.C., Liu, B., Shen, J., Lane, I. (2018) Online Incremental Learning for Speaker-Adaptive Language Models. Proc. Interspeech 2018, 3363-3367, DOI: 10.21437/Interspeech.2018-2259.


@inproceedings{Hu2018,
  author={Chih Chi Hu and Bing Liu and John Shen and Ian Lane},
  title={Online Incremental Learning for Speaker-Adaptive Language Models},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3363--3367},
  doi={10.21437/Interspeech.2018-2259},
  url={http://dx.doi.org/10.21437/Interspeech.2018-2259}
}