Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Accurate Vocal Event Detection Method Based on a Fixed-Point Analysis of Mapping from Time to Weighted Average Group Delay

Hideki Kawahara (1), Yoshinori Atake (2), Parham Zolfaghari (3)

(1) Wakayama University/ATR/CREST, Wakayama, Japan
(2) NAIST, Ikoma, Nara, Japan
(3) CIAIR/Nagoya University, Nagoya, Aichi, Japan

A new procedure for event detection and characterization is proposed based on group delay and fixed point analysis. This method enables the detection of precise timing and spread of speech events such as a vocal fold closure. A mapping from the center of a Gaussian time window to the mean time provides event locations as its fixed points. Refining these initial estimates using minimum phase group delay functions derived from the amplitude spectra provides accurate estimates of event locations and durations of excitations of each event. The proposed algorithm was tested using synthetic speech samples and natural speech database of simultaneously recorded sound waveforms and EGG signals. These tests revealed that the proposed method provides estimates of vocal fold closure instants with timing accuracy within 60 Ás to 210 Ás standard deviations. This algorithm is implemented to be suitable for real-time operation by making extensive use of FFTs without introducing any iterative procedures. It is potentially a very powerful tool for speech diagnosis and construction of very high quality speech manipulation systems.

Full Paper

Bibliographic reference.  Kawahara, Hideki / Atake, Yoshinori / Zolfaghari, Parham (2000): "Accurate vocal event detection method based on a fixed-point analysis of mapping from time to weighted average group delay", In ICSLP-2000, vol.4, 664-667.