![]() |
INTERSPEECH 2013
|
![]() |
This paper describes a statistical parametric speech synthesizer that, despite having been trained on an ordinary synthesis database and without any adaptation data, is able to generate highly intelligible speech in noisy environments. By using a simple and flexible vocoder based on a harmonic model, it applies several noiseindependent modifications to durations, pitch level and range, energy contour, formant sharpness, and intensity of particular spectral bands. The system has been evaluated by means of a large subjective test, the results of which show that the suggested approach clearly outperforms the reference TTS systems and even unmodified natural speech in some conditions
Bibliographic reference. Erro, D. / Zorilă, T. C. / Stylianou, Yannis / Navas, E. / Hernaez, I. (2013): "Statistical synthesizer with embedded prosodic and spectral modifications to generate highly intelligible speech in noise", In INTERSPEECH-2013, 3557-3561.