Speaker adaptation for TTS applications has been receiving more attention in recent years for applications such as voice customisation or voice banking. If these applications are offered as an internet service, there is no control on the quality of the data that can be collected. It can be noisy with people talking in the background or recorded in a reverberant environment. This makes the adaptation more difficult. This paper explores the effect of different levels of additive and convolutional noise on speaker adaptation techniques based on cluster adaptive training (CAT) and average voice model (AVM). The results indicate that although both techniques suffer degradation to some extent, CAT is in general more robust than AVM.
Index Terms: speech synthesis, cluster adaptive training, speaker adaptation, average voice models, noise robust adaptation
Cite as: Yanagisawa, K., Latorre, J., Wan, V., Gales, M.J.F., King, S. (2013) Noise robustness in HMM-TTS speaker adaptation. Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8), 119-124
@inproceedings{yanagisawa13_ssw, author={Kayoko Yanagisawa and Javier Latorre and Vincent Wan and Mark J. F. Gales and Simon King}, title={{Noise robustness in HMM-TTS speaker adaptation}}, year=2013, booktitle={Proc. 8th ISCA Workshop on Speech Synthesis (SSW 8)}, pages={119--124} }