Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese

Sheng Li, Xugang Lu, Chenchen Ding, Peng Shen, Tatsuya Kawahara, Hisashi Kawai


Training automatic speech recognition (ASR) systems for East Asian languages (e.g., Chinese and Japanese) is tough work because of the characters existing in the writing systems of these languages. Traditionally, we first need to get the pronunciation of these characters by morphological analysis. The end-to-end (E2E) model allows for directly using characters or words as the modeling unit. However, since different groups of people (e.g., residents in Chinese mainland, Hong Kong, Taiwan, and Japan) adopts different writing forms for a character, this also leads to a large increase in the number of vocabulary, especially when building ASR systems across languages or dialects. In this paper, we propose a new E2E ASR modeling method by decomposing the characters into a set of radicals. Our experiments demonstrate that it is possible to effectively reduce the vocabulary size by sharing the basic radicals across different dialect of Chinese. Moreover, we also demonstrate this method could also be used to construct a Japanese E2E ASR system. The system modeled with radicals and kana achieved similar performance compared to state-of-the-art E2E system built with word-piece units.


 DOI: 10.21437/Interspeech.2019-2104

Cite as: Li, S., Lu, X., Ding, C., Shen, P., Kawahara, T., Kawai, H. (2019) Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese. Proc. Interspeech 2019, 2200-2204, DOI: 10.21437/Interspeech.2019-2104.


@inproceedings{Li2019,
  author={Sheng Li and Xugang Lu and Chenchen Ding and Peng Shen and Tatsuya Kawahara and Hisashi Kawai},
  title={{Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={2200--2204},
  doi={10.21437/Interspeech.2019-2104},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2104}
}