Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems

David Ayllón, Héctor A. Sánchez-Hevia, Carol Figueroa, Pierre Lanchantin


The quality of the voices synthesized by a Text-to-Speech (TTS) system depends on the quality of the training data. In real case scenario of TTS personalization from user’s voice recordings, the latter are usually affected by noise and reverberation. Speech enhancement can be useful to clean the corrupted speech but it is necessary to understand the effects that noise and reverberation have on the different statistical models that compose the TTS system. In this work we perform a thorough study of how noise and reverberation impact the acoustic and duration models of the TTS system. We also evaluate the effectiveness of time-frequency masking for cleaning the training data. Objective and subjective evaluations reveal that under normal recording scenarios noise leads to a higher degradation than reverberation in terms of naturalness of the synthesized speech.


 DOI: 10.21437/Interspeech.2019-3104

Cite as: Ayllón, D., Sánchez-Hevia, H.A., Figueroa, C., Lanchantin, P. (2019) Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems. Proc. Interspeech 2019, 1511-1515, DOI: 10.21437/Interspeech.2019-3104.


@inproceedings{Ayllón2019,
  author={David Ayllón and Héctor A. Sánchez-Hevia and Carol Figueroa and Pierre Lanchantin},
  title={{Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={1511--1515},
  doi={10.21437/Interspeech.2019-3104},
  url={http://dx.doi.org/10.21437/Interspeech.2019-3104}
}