ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Multi-Domain Knowledge Distillation via Uncertainty-Matching for End-to-End ASR Models

Ho-Gyeong Kim, Min-Joong Lee, Hoshik Lee, Tae Gyoon Kang, Jihyun Lee, Eunho Yang, Sung Ju Hwang

Knowledge Distillation basically matches predictive distributions of student and teacher networks to improve performance in an environment with model capacity and/or data constraints. However, it is well known that predictive distribution of neural networks not only tends to be overly confident, but also cannot directly model various factors properly that contribute to uncertainty. Recently, deep learning studies based on uncertainty have been successful in various fields, especially in several computer vision tasks. The prediction probability can implicitly show the information about how confident the network is, however, we can explicitly utilize confidence of the output by modeling the uncertainty of the network. In this paper, we propose a novel knowledge distillation method for automatic speech recognition that directly models and transfers the uncertainty inherent in data observation such as speaker variations or confusing pronunciations. Moreover, we investigate an effect of transferring knowledge more effectively using multiple teachers learned from various domains. Evaluated on WSJ which is the standard benchmark dataset with limited instances, the proposed knowledge distillation method achieves significant improvements over student baseline models.


doi: 10.21437/Interspeech.2021-1169

Cite as: Kim, H.-G., Lee, M.-J., Lee, H., Kang, T.G., Lee, J., Yang, E., Hwang, S.J. (2021) Multi-Domain Knowledge Distillation via Uncertainty-Matching for End-to-End ASR Models. Proc. Interspeech 2021, 2531-2535, doi: 10.21437/Interspeech.2021-1169

@inproceedings{kim21g_interspeech,
  author={Ho-Gyeong Kim and Min-Joong Lee and Hoshik Lee and Tae Gyoon Kang and Jihyun Lee and Eunho Yang and Sung Ju Hwang},
  title={{Multi-Domain Knowledge Distillation via Uncertainty-Matching for End-to-End ASR Models}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={2531--2535},
  doi={10.21437/Interspeech.2021-1169}
}