ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Pseudo-labeling (PL) has been shown to be effective in semi-supervised automatic speech recognition (ASR), where a base model is self-trained with pseudo-labels generated from unlabeled data. While PL can be further improved by iteratively updating pseudo-labels as the model evolves, most of the previous approaches involve inefficient retraining of the model or intricate control of the label update. We present momentum pseudo-labeling (MPL), a simple yet effective strategy for semi-supervised ASR. MPL consists of a pair of online and offline models that interact and learn from each other, inspired by the mean teacher method. The online model is trained to predict pseudo-labels generated on the fly by the offline model. The offline model maintains a momentum-based moving average of the online model. MPL is performed in a single training process and the interaction between the two models effectively helps them reinforce each other to improve the ASR performance. We apply MPL to an end-to-end ASR model based on the connectionist temporal classification. The experimental results demonstrate that MPL effectively improves over the base model and is scalable to different semi-supervised scenarios with varying amounts of data or domain mismatch.


doi: 10.21437/Interspeech.2021-571

Cite as: Higuchi, Y., Moritz, N., Roux, J.L., Hori, T. (2021) Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition. Proc. Interspeech 2021, 726-730, doi: 10.21437/Interspeech.2021-571

@inproceedings{higuchi21_interspeech,
  author={Yosuke Higuchi and Niko Moritz and Jonathan Le Roux and Takaaki Hori},
  title={{Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={726--730},
  doi={10.21437/Interspeech.2021-571}
}