On Building Mixed Lingual Speech Synthesis Systems

SaiKrishna Rallabandi, Alan W. Black


Codemixing — phenomenon where lexical items from one language are embedded in the utterance of another — is relatively frequent in multilingual communities. However, TTS systems today are not fully capable of effectively handling such mixed content despite achieving high quality in the monolingual case. In this paper, we investigate various mechanisms for building mixed lingual systems which are built using a mixture of monolingual corpora and are capable of synthesizing such content. First, we explore the possibility of manipulating the phoneme representation: using target word to source phone mapping with the aim of emulating the native speaker intuition. We then present experiments at the acoustic stage investigating training techniques at both spectral and prosodic levels. Subjective evaluation shows that our systems are capable of generating high quality synthesis in codemixed scenarios.


 DOI: 10.21437/Interspeech.2017-1244

Cite as: Rallabandi, S., Black, A.W. (2017) On Building Mixed Lingual Speech Synthesis Systems. Proc. Interspeech 2017, 52-56, DOI: 10.21437/Interspeech.2017-1244.


@inproceedings{Rallabandi2017,
  author={SaiKrishna Rallabandi and Alan W. Black},
  title={On Building Mixed Lingual Speech Synthesis Systems},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={52--56},
  doi={10.21437/Interspeech.2017-1244},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1244}
}