EUROSPEECH 2003 - INTERSPEECH 2003
In this paper, a glottal event synchronous sinusoidal model is proposed. A glottal event corresponds to the glottal closure instant (GCI), which is accurately estimated using group delay and fixed point analysis in the time domain using energy centroids. The GCI synchronous sinusoidal model allows adequate processing according to the inherent local properties of speech, resulting in phase matching between adjacent and corresponding harmonics that are essential for precise speech analysis. Frequency domain fixed points from mapping filter center frequencies to the instantaneous frequencies of the filter outputs result in highly accurate estimates of the constituent sinusoidal components. Adequate window selection and placement at the GCI is found to be important in obtaining stable sinusoidal components. We demonstrate that the GCI synchronous instantaneous frequency method allows a large reduction in spurious peaks in the spectrum and enables high quality synthesised speech. In speech quality evaluations, glottal synchronous analysis-synthesis results in a 0.4 improvement in MOS over conventional fixed frame rate analysis-synthesis.
Bibliographic reference. Zolfaghari, Parham / Nakatani, Tomohiro / Irino, Toshio / Kawahara, Hideki / Itakura, Fumitada (2003): "Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis", In EUROSPEECH-2003, 2441-2444.