ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

End-to-End Language Diarization for Bilingual Code-Switching Speech

Hexin Liu, Leibny Paola García Perera, Xinyi Zhang, Justin Dauwels, Andy W.H. Khong, Sanjeev Khudanpur, Suzy J. Styles

We propose two end-to-end neural configurations for language diarization on bilingual code-switching speech. The first, a BLSTM-E2E architecture, includes a set of stacked bidirectional LSTMs to compute embeddings and incorporates the deep clustering loss to enforce grouping of languages belonging to the same class. The second, an XSA-E2E architecture, is based on an x-vector model followed by a self-attention encoder. The former encodes frame-level features into segment-level embeddings while the latter considers all those embeddings to generate a sequence of segment-level language labels. We evaluated the proposed methods on the dataset obtained from the shared task B in WSTCSMC 2020 and our handcrafted simulated data from the SEAME dataset. Experimental results show that our proposed XSA-E2E architecture achieved a relative improvement of 12.1% in equal error rate and a 7.4% relative improvement on accuracy compared with the baseline algorithm in the WSTCSMC 2020 dataset. Our proposed XSA-E2E architecture achieved an accuracy of 89.84% with a baseline of 85.60% on the simulated data derived from the SEAME dataset.

doi: 10.21437/Interspeech.2021-82

Cite as: Liu, H., Perera, L.P.G., Zhang, X., Dauwels, J., Khong, A.W.H., Khudanpur, S., Styles, S.J. (2021) End-to-End Language Diarization for Bilingual Code-Switching Speech. Proc. Interspeech 2021, 1489-1493, doi: 10.21437/Interspeech.2021-82

  author={Hexin Liu and Leibny Paola García Perera and Xinyi Zhang and Justin Dauwels and Andy W.H. Khong and Sanjeev Khudanpur and Suzy J. Styles},
  title={{End-to-End Language Diarization for Bilingual Code-Switching Speech}},
  booktitle={Proc. Interspeech 2021},