This paper describes an HMM-based speech synthesis system that utilizes glottal inverse filtering for generating natural sounding synthetic speech. In the proposed system, speech is first parametrized into spectral and excitation features using a glottal inverse filtering based method. The parameters are fed into an HMM system for training and then generated from the trained HMM according to text input. Glottal flow pulses extracted from real speech are used as a voice source, and the voice source is further modified according to the all-pole model parameters generated by the HMM. Preliminary experiments show that the proposed system is capable of generating natural sounding speech, and the quality is clearly better compared to a system utilizing a conventional impulse train excitation model.
Bibliographic reference. Raitio, Tuomo / Suni, Antti / Pulakka, Hannu / Vainio, Martti / Alku, Paavo (2008): "HMM-based Finnish text-to-speech system utilizing glottal inverse filtering", In INTERSPEECH-2008, 1881-1884.