We present a study on robust pitch estimation by integrating spectral and temporal information. Spectrum harmonics are important representations of the speech fundamental frequency (F0). Harmonic-related spectral peaks of speech evolve much more slowly than the spectral peaks of noise. This motivates the proposition of temporally accumulated peak spectrum (TAPS), which is computed by cumulating spectrum peaks over consecutive analysis frames. In TAPS, harmonics-related peaks are concentrated around the F0 and its multiples, while the peaks caused by noise are irregularly distributed with relatively small amplitude. A pitch estimation method is derived based on TAPS. Peak locations on the autocorrelation of TAPS indicate the frequency separations between the harmonic peaks, which are used to estimate the F0. The proposed method is evaluated on speech signals corrupted by white noise, speech noise and babble noise. The results show that our method performs more robustly and reliably than conventional methods.
Bibliographic reference. Huang, Feng / Lee, Tan (2010): "Pitch estimation in noisy speech based on temporal accumulation of spectrum peaks", In INTERSPEECH-2010, 641-644.