Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification

Peng Shen, Xugang Lu, Sheng Li, Hisashi Kawai


The performance of spoken language identification (LID) on short utterances is drastically degraded even though model is completely trained on short utterance data set. The degradation is because of the large pattern confusion caused by the large variation of feature representation on short utterances. In this paper, we propose a teacher-student network learning algorithm to explore discriminative features for short utterances. With the teacher-student network learning, the feature representation for short utterances (explored by the student network) are normalized to their representations corresponding to long utterances (provided by the teacher network). With this learning algorithm, the feature representation on short utterances is supposed to reduce pattern confusion. Experiments on a 10-language LID task were carried out to test the algorithm. Our results showed the proposed algorithm significantly improved the performance.


 DOI: 10.21437/Interspeech.2018-1519

Cite as: Shen, P., Lu, X., Li, S., Kawai, H. (2018) Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification. Proc. Interspeech 2018, 1813-1817, DOI: 10.21437/Interspeech.2018-1519.


@inproceedings{Shen2018,
  author={Peng Shen and Xugang Lu and Sheng Li and Hisashi Kawai},
  title={Feature Representation of Short Utterances Based on Knowledge Distillation for Spoken Language Identification},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={1813--1817},
  doi={10.21437/Interspeech.2018-1519},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1519}
}