This paper describes a technique for speaker and language adaptive training (SLAT) for HMM-based polyglot speech synthesis and its evaluations on a multi-lingual speech corpus. The SLAT technique allows multi-speaker/multi-language adaptive training and synthesis to be performed. Experimental results show that the SLAT technique achieves better naturalness than both speaker-adaptively trained language-dependent (LD-SAT) and language-independent (LI-SAT) models. In cross-lingual adaptation speaker similarity tests SLAT and LI-SAT outperform LD-SAT but there are still significant differences between polyglot adaptation and intra-language adaptation.
Cite as: Zen, H., Braunschweiler, N., Buchholz, S., Knill, K., Krstulovic, S., Latorre, J. (2010) HMM-based polyglot speech synthesis by speaker and language adaptive training. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 186-191
@inproceedings{zen10_ssw, author={Heiga Zen and Norbert Braunschweiler and Sabine Buchholz and Kate Knill and Sacha Krstulovic and Javier Latorre}, title={{HMM-based polyglot speech synthesis by speaker and language adaptive training}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={186--191} }