Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation

Amit Das, Mark Hasegawa-Johnson


Often, it is quite hard to find native transcribers in under-resourced languages. However, Turkers (crowd workers) available in online marketplaces can serve as valuable alternative resources by providing transcriptions in the target language. Since the Turkers may neither speak nor have any familiarity with the target language, their transcriptions are non-native by nature and are usually filled with incorrect labels. After some post-processing, these transcriptions can be converted to Probabilistic Transcriptions (PT). Conventional Deep Neural Networks (DNN) trained using PTs do not necessarily improve error rates over Gaussian Mixture Models (GMMs) due to the presence of label noise. Previously reported results have demonstrated some success by adopting Multi-Task Learning (MTL) training for PTs. In this study, we report further improvements using Knowledge Distillation (KD) and Target Interpolation (TI) to alleviate transcription errors in PTs. In the KD method, knowledge is transfered from a well-trained multilingual DNN to the target language DNN trained using PTs. In the TI method, the confidences of the labels provided by PTs are modified using the confidences of the target language DNN. Results show an average absolute improvement in phone error rates (PER) by about 1.9% across Swahili, Amharic, Dinka and Mandarin using each proposed method.


 DOI: 10.21437/Interspeech.2018-1450

Cite as: Das, A., Hasegawa-Johnson, M. (2018) Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation. Proc. Interspeech 2018, 2434-2438, DOI: 10.21437/Interspeech.2018-1450.


@inproceedings{Das2018,
  author={Amit Das and Mark Hasegawa-Johnson},
  title={Improving DNNs Trained with Non-Native Transcriptions Using Knowledge Distillation and Target Interpolation},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2434--2438},
  doi={10.21437/Interspeech.2018-1450},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1450}
}