Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Robust Fundamental Frequency Estimation Using Instantaneous Frequencies of Harmonic Components

Yoshinori Atake (1), Toshio Irino (2,4), Hideki Kawahara (3,4,2), Jinlin Lu (1), Satoshi Nakamura (1), Kiyohiro Shikano (1)

(1) Graduate School of Information Science, Nara Institute of Science and Technology, Japan
(2) ATR Human Information Processing Laboratory, Kyoto, Japan
(3) Faculty of Systems Engineering, Wakayama University, Japan
(4) CREST, Japan

This paper proposes a noise-tolerant method for fundamental frequency (F0) extraction. This method includes several new ideas, including the estimation of the instantaneous frequencies of the higher harmonic components, and the design of an adaptive weighting function based on a bandwidth equation that combines the F0 information in the harmonic components. To evaluate the proposed method, we constructed a relatively large database of simultaneous recordings of speech waveforms and EGG (Electro Glotto Graphy). The database consists of 30 sentences pronounced by 14 male and 14 female normal subjects, i.e., 840 sentences in total. The duration of the sound is about 35 minutes including about 20 minutes of voicing. The experiments were performed with additive noise for four pitch extraction methods, i.e., the proposed method, the original TEMPO, an improved cepstrum method, and a common F0 extraction program in ESPS. The results were as follows: 1) the proposed method is always better than any of the other methods when the SNR is greater than about 2 dB; 2) for high SNR values (> 15 dB), the correct rates of the proposed method and the original TEMPO are about 95% and much better than the improved cepstrum method (92%) and the ESPS function (89%); and 3) all of the methods degrade to less than 62% when the SNR is 0 dB. As a result, the proposed method improves the performance for low SNR values and also maintains high accuracy inherent from the original TEMPO for high SNR values.

Full Paper

Bibliographic reference.  Atake, Yoshinori / Irino, Toshio / Kawahara, Hideki / Lu, Jinlin / Nakamura, Satoshi / Shikano, Kiyohiro (2000): "Robust fundamental frequency estimation using instantaneous frequencies of harmonic components", In ICSLP-2000, vol.2, 907-910.