A fundamental frequency (F0) estimator named Harvest is described. The unique points of Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced section is wrongly identified as the unvoiced section. It consists of two steps: estimation of F0 candidates and generation of a reliable F0 contour on the basis of these candidates. In the first step, the algorithm uses fundamental component extraction by many band-pass filters with different center frequencies and obtains the basic F0 candidates from filtered signals. After that, basic F0 candidates are refined and scored by using the instantaneous frequency, and then several F0 candidates in each frame are estimated. Since the frame-by-frame processing based on the fundamental component extraction is not robust against temporally local noise, a connection algorithm using neighboring F0s is used in the second step. The connection takes advantage of the fact that the F0 contour does not precipitously change in a short interval. We carried out an evaluation using two speech databases with electroglottograph (EGG) signals to compare Harvest with several state-of-the-art algorithms. Results showed that Harvest achieved the best performance of all algorithms.
Cite as: Morise, M. (2017) Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals. Proc. Interspeech 2017, 2321-2325, doi: 10.21437/Interspeech.2017-68
@inproceedings{morise17b_interspeech, author={Masanori Morise}, title={{Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={2321--2325}, doi={10.21437/Interspeech.2017-68} }