Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks

Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen


Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception and that steady-state regions are secondary or supplemental. Acoustic landmarks are perceptually salient, even in a language one doesn't speak and it has been demonstrated that non-speakers of the language can identify features such as the primary articulator of the landmark. These factors suggest a strategy for developing language-independent automatic speech recognition: landmarks can potentially be learned once from a suitably labeled corpus and rapidly applied to many other languages. This paper proposes enhancing the cross-lingual portability of a neural network by using landmarks as the secondary task in multi-task learning (MTL). The network is trained in a well-resourced source language with both phone and landmark labels (English), then adapted to an under-resourced target language with only word labels (Iban). Landmark-tasked MTL reduces source-language phone error rate by 2.9% relative and reduces target-language word error rate by 1.9%-5.9% depending on the amount of target-language training data. These results suggest that landmark-tasked MTL causes the DNN to learn hidden-node features that are useful for cross-lingual adaptation.


 DOI: 10.21437/Interspeech.2018-1124

Cite as: He, D., Lim, B.P., Yang, X., Hasegawa-Johnson, M., Chen, D. (2018) Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks. Proc. Interspeech 2018, 2618-2622, DOI: 10.21437/Interspeech.2018-1124.


@inproceedings{He2018,
  author={Di He and Boon Pang Lim and Xuesong Yang and Mark Hasegawa-Johnson and Deming Chen},
  title={Improved ASR for Under-resourced Languages through Multi-task Learning with Acoustic Landmarks},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2618--2622},
  doi={10.21437/Interspeech.2018-1124},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1124}
}