End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition

Sheng Li, Chenchen Ding, Xugang Lu, Peng Shen, Tatsuya Kawahara, Hisashi Kawai


The end-to-end (E2E) model allows for training of automatic speech recognition (ASR) systems without the hand-designed language-specific pronunciation lexicons. However, constructing the multilingual low-resource E2E ASR system is still challenging due to the vast number of symbols (e.g., words and characters). In this paper, we investigate an efficient method of encoding multilingual transcriptions for training E2E ASR systems. We directly encode the symbols of multilingual writing systems to universal articulatory representations, which is much more compact than characters and words. Compared with traditional multilingual modeling methods, we directly build a single acoustic-articulatory within recent transformer-based E2E framework for ASR tasks. The speech recognition results of our proposed method significantly outperform the conventional word-based and character-based E2E models.


 DOI: 10.21437/Interspeech.2019-2092

Cite as: Li, S., Ding, C., Lu, X., Shen, P., Kawahara, T., Kawai, H. (2019) End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition. Proc. Interspeech 2019, 2145-2149, DOI: 10.21437/Interspeech.2019-2092.


@inproceedings{Li2019,
  author={Sheng Li and Chenchen Ding and Xugang Lu and Peng Shen and Tatsuya Kawahara and Hisashi Kawai},
  title={{End-to-End Articulatory Attribute Modeling for Low-Resource Multilingual Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2145--2149},
  doi={10.21437/Interspeech.2019-2092},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2092}
}