On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition

Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Haihua Xu, Eng Siong Chng, Haizhou Li


Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances. In this work, we study end-to-end (E2E) approaches to the Mandarin-English code-switching speech recognition task. We first examine the effectiveness of using data augmentation and byte-pair encoding (BPE) subword units. More importantly, we propose a multitask learning recipe, where a language identification task is explicitly learned in addition to the E2E speech recognition task. Furthermore, we introduce an efficient word vocabulary expansion method for language modeling to alleviate data sparsity issues under the code-switching scenario. Experimental results on the SEAME data, a Mandarin-English code-switching corpus, demonstrate the effectiveness of the proposed methods.


 DOI: 10.21437/Interspeech.2019-1429

Cite as: Zeng, Z., Khassanov, Y., Pham, V.T., Xu, H., Chng, E.S., Li, H. (2019) On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition. Proc. Interspeech 2019, 2165-2169, DOI: 10.21437/Interspeech.2019-1429.


@inproceedings{Zeng2019,
  author={Zhiping Zeng and Yerbolat Khassanov and Van Tung Pham and Haihua Xu and Eng Siong Chng and Haizhou Li},
  title={{On the End-to-End Solution to Mandarin-English Code-Switching Speech Recognition}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2165--2169},
  doi={10.21437/Interspeech.2019-1429},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1429}
}