In this paper we propose a model-based approach to instantaneous pitch estimation in noisy speech, by way of incorporating pitch smoothness assumptions into the well-known harmonic model. In this approach, the latent pitch contour is modeled using a basis of smooth polynomials, and is fit to waveform data by way of a harmonic model whose partials have time-varying amplitudes. The resultant nonlinear least squares estimation task is accomplished through the Gauss-Newton method with a novel initialization step that serves to greatly increase algorithm efficiency. We demonstrate the accuracy and robustness of our method through comparisons to state-of-the art pitch estimation algorithms using both simulated and real waveform data.
Cite as: Hong, J.O., Wolfe, P.J. (2009) Model-based estimation of instantaneous pitch in noisy speech. Proc. Interspeech 2009, 112-115, doi: 10.21437/Interspeech.2009-26
@inproceedings{hong09_interspeech, author={Jung Ook Hong and Patrick J. Wolfe}, title={{Model-based estimation of instantaneous pitch in noisy speech}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={112--115}, doi={10.21437/Interspeech.2009-26} }