EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Glottal Closure Instant Synchronous Sinusoidal Model for High Quality Speech Analysis/Synthesis

Parham Zolfaghari (1), Tomohiro Nakatani (1), Toshio Irino (2), Hideki Kawahara (2), Fumitada Itakura (3)

(1) NTT Corporation, Japan
(2) Wakayama University, Japan
(3) Nagoya University, Japan

In this paper, a glottal event synchronous sinusoidal model is proposed. A glottal event corresponds to the glottal closure instant (GCI), which is accurately estimated using group delay and fixed point analysis in the time domain using energy centroids. The GCI synchronous sinusoidal model allows adequate processing according to the inherent local properties of speech, resulting in phase matching between adjacent and corresponding harmonics that are essential for precise speech analysis. Frequency domain fixed points from mapping filter center frequencies to the instantaneous frequencies of the filter outputs result in highly accurate estimates of the constituent sinusoidal components. Adequate window selection and placement at the GCI is found to be important in obtaining stable sinusoidal components. We demonstrate that the GCI synchronous instantaneous frequency method allows a large reduction in spurious peaks in the spectrum and enables high quality synthesised speech. In speech quality evaluations, glottal synchronous analysis-synthesis results in a 0.4 improvement in MOS over conventional fixed frame rate analysis-synthesis.

Full Paper

Bibliographic reference.  Zolfaghari, Parham / Nakatani, Tomohiro / Irino, Toshio / Kawahara, Hideki / Itakura, Fumitada (2003): "Glottal closure instant synchronous sinusoidal model for high quality speech analysis/synthesis", In EUROSPEECH-2003, 2441-2444.