ISCA Archive SSW 2019
ISCA Archive SSW 2019

Building Multilingual End-to-End Speech Synthesisers for Indian Languages

Anusha Prakash, Anju Leela Thomas, S. Umesh, Hema A Murthy

Building text-to-speech (TTS) synthesisers is a difficult task, especially for low resource languages. Language-specific modules need to be developed for system building. End-to-end speech synthesis has become a popular paradigm as a TTS can be trained using only pairs. However, end-to-end speech synthesis is not scalable in a multilanguage scenario, as the vocabulary increases with the number of different scripts. In this paper, TTSes are trained for Indian languages using two text representations– character-based and phone-based. For the character-based approach, a multi-language character map (MLCM) is proposed to easily train Indic speech synthesisers. The phone-based approach uses the common label set (CLS) representation for Indian languages. Both approaches leverage the similarities that exist among the languages. The advantage is a compact representation across multiple languages. Experiments are conducted by building TTSes using monolingual data and by pooling data across two languages. The ability to synthesise code-mixed text using the phone-based approach is also assessed. Subjective evaluations indicate that reasonably good quality Indic TTSes can be developed using both approaches. This emphasises the need to incorporate multilingual text processing in the end-to-end framework.

doi: 10.21437/SSW.2019-35

Cite as: Prakash, A., Leela Thomas, A., Umesh, S., A Murthy, H. (2019) Building Multilingual End-to-End Speech Synthesisers for Indian Languages. Proc. 10th ISCA Workshop on Speech Synthesis (SSW 10), 194-199, doi: 10.21437/SSW.2019-35

  author={Anusha Prakash and Anju {Leela Thomas} and S. Umesh and Hema {A Murthy}},
  title={{Building Multilingual End-to-End Speech Synthesisers for Indian Languages}},
  booktitle={Proc. 10th ISCA Workshop on Speech Synthesis (SSW 10)},