Control over voice quality, e.g. breathy and tense voice, is important for speech synthesis applications. For example, transformations can be used to modify aspects of the voice related to speaker’s identity and to improve expressiveness. However, it is hard to modify voice characteristics of the synthetic speech, without degrading speech quality. State-of-the-art statistical speech synthesisers, in particular, do not typically allow control over parameters of the glottal source, which are strongly correlated with voice quality. Consequently, the control of voice characteristics in these systems is limited. In contrast, the HMM-based speech synthesiser proposed in this paper uses an acoustic glottal source model. The system passes the glottal signal through a whitening filter to obtain the excitation of voiced sounds. This technique, called glottal post-filtering, allows to transform voice characteristics of the synthetic speech by modifying the source model parameters.
We evaluated the proposed synthesiser in a perceptual experiment, in terms of speech naturalness, intelligibility, and similarity to the original speaker’s voice. The results show that it performed as well as a HMM-based synthesiser, which generates the speech signal with a commonly used high-quality speech vocoder.
Index Terms: HMM-based speech synthesis, voice quality, glottal post-filter
Cite as: Cabral, J.P., Renals, S., Richmond, K., Yamagishi, J. (2010) An HMM-based speech synthesiser using glottal post-filtering. Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7), 365-370
@inproceedings{cabral10_ssw, author={João P. Cabral and Steve Renals and Korin Richmond and Junichi Yamagishi}, title={{An HMM-based speech synthesiser using glottal post-filtering}}, year=2010, booktitle={Proc. 7th ISCA Workshop on Speech Synthesis (SSW 7)}, pages={365--370} }