This paper describes the application of the Harmonic plus Noise Model, HNM, for concatenative Text-to-Speech (TTS) synthesis. In the context of HNM, speech signals are represented as a time-varying harmonic component plus a modulated noise component. The decomposition of speech signal in these two components allows for more natural-sounding modifications (e.g., source and filter modifications) of the signal. The parametric representation of speech using HNM provides a straightforward way of smoothing discontinuities of acoustic units around concatenation points. Formal listening tests have shown that HNM provides high-quality speech synthesis while outperforming other models for synthesis (e.g., TD-PSOLA) in intelligibility, naturalness and pleasantness.
Cite as: Stylianou, Y. (1998) Concatenative speech synthesis using a harmonic plus noise model. Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3), 261-266
@inproceedings{stylianou98_ssw, author={Yannis Stylianou}, title={{Concatenative speech synthesis using a harmonic plus noise model}}, year=1998, booktitle={Proc. 3rd ESCA/COCOSDA Workshop on Speech Synthesis (SSW 3)}, pages={261--266} }