14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Statistical Synthesizer with Embedded Prosodic and Spectral Modifications to Generate Highly Intelligible Speech in Noise

D. Erro (1), T. C. Zorilă (2), Yannis Stylianou (3), E. Navas (1), I. Hernaez (1)

(1) Universidad del País Vasco, Spain
(2) Universitatea Politehnica din Bucureşti, Romania
(3) FORTH, Greece

This paper describes a statistical parametric speech synthesizer that, despite having been trained on an ordinary synthesis database and without any adaptation data, is able to generate highly intelligible speech in noisy environments. By using a simple and flexible vocoder based on a harmonic model, it applies several noiseindependent modifications to durations, pitch level and range, energy contour, formant sharpness, and intensity of particular spectral bands. The system has been evaluated by means of a large subjective test, the results of which show that the suggested approach clearly outperforms the reference TTS systems and even unmodified natural speech in some conditions

Full Paper

Bibliographic reference.  Erro, D. / Zorilă, T. C. / Stylianou, Yannis / Navas, E. / Hernaez, I. (2013): "Statistical synthesizer with embedded prosodic and spectral modifications to generate highly intelligible speech in noise", In INTERSPEECH-2013, 3557-3561.