We introduce phylogenetic and areal language features to the domain of multilingual text-to-speech synthesis. Intuitively, enriching the existing universal phonetic features with cross-lingual shared representations should benefit the multilingual acoustic models and help to address issues like data scarcity for low-resource languages. We investigate these representations using the acoustic models based on long short-term memory recurrent neural networks. Subjective evaluations conducted on eight languages from diverse language families show that sometimes phylogenetic and areal representations lead to significant multilingual synthesis quality improvements. To help better leverage these novel features, improving the baseline phonetic representation may be necessary.
Cite as: Gutkin, A., Sproat, R. (2017) Areal and Phylogenetic Features for Multilingual Speech Synthesis. Proc. Interspeech 2017, 2078-2082, doi: 10.21437/Interspeech.2017-160
@inproceedings{gutkin17_interspeech, author={Alexander Gutkin and Richard Sproat}, title={{Areal and Phylogenetic Features for Multilingual Speech Synthesis}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2078--2082}, doi={10.21437/Interspeech.2017-160} }