Variational Attention Using Articulatory Priors for Generating Code Mixed Speech Using Monolingual Corpora

SaiKrishna Rallabandi, Alan W. Black


Code Mixing — phenomenon where lexical items from one language are embedded in the utterance of another — is relatively frequent in multilingual communities and therefore speech systems should be able to process such content. However, building a voice capable of synthesizing such content typically requires bilingual recordings from the speaker which might not always be easy to obtain. In this work, we present an approach for building mixed lingual systems using only monolingual corpora. Specifically we present a way to train multi speaker text to speech system by incorporating stochastic latent variables into the attention mechanism with the objective of synthesizing code mixed content. We subject the prior distribution for such latent variables to match articulatory constraints. Subjective evaluation shows that our systems are capable of generating high quality synthesis in code mixed scenarios.


 DOI: 10.21437/Interspeech.2019-1103

Cite as: Rallabandi, S., Black, A.W. (2019) Variational Attention Using Articulatory Priors for Generating Code Mixed Speech Using Monolingual Corpora. Proc. Interspeech 2019, 3735-3739, DOI: 10.21437/Interspeech.2019-1103.


@inproceedings{Rallabandi2019,
  author={SaiKrishna Rallabandi and Alan W. Black},
  title={{Variational Attention Using Articulatory Priors for Generating Code Mixed Speech Using Monolingual Corpora}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3735--3739},
  doi={10.21437/Interspeech.2019-1103},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1103}
}