Eighth ISCA Workshop on Speech Synthesis

Barcelona, Catalonia, Spain
August 31-September 2, 2013

Noise Robustness in HMM-TTS Speaker Adaptation

Kayoko Yanagisawa, Javier Latorre, Vincent Wan, Mark J. F. Gales, Simon King

Toshiba Research Europe Ltd., UK

Speaker adaptation for TTS applications has been receiving more attention in recent years for applications such as voice customisation or voice banking. If these applications are offered as an internet service, there is no control on the quality of the data that can be collected. It can be noisy with people talking in the background or recorded in a reverberant environment. This makes the adaptation more difficult. This paper explores the effect of different levels of additive and convolutional noise on speaker adaptation techniques based on cluster adaptive training (CAT) and average voice model (AVM). The results indicate that although both techniques suffer degradation to some extent, CAT is in general more robust than AVM. Index Terms: speech synthesis, cluster adaptive training, speaker adaptation, average voice models, noise robust adaptation

Full Paper

Bibliographic reference.  Yanagisawa, Kayoko / Latorre, Javier / Wan, Vincent / Gales, Mark J. F. / King, Simon (2013): "Noise robustness in HMM-TTS speaker adaptation", In SSW8, 119-124.