15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

F0 Estimation in Noisy Speech Based on Long-Term Harmonic Feature Analysis Combined with Neural Network Classification

Dongmei Wang, Philipos C. Loizou, John H. L. Hansen

University of Texas at Dallas, USA

In this study, we propose a frequency domain F0 estimation approach based on long term Harmonic Feature Analysis combined with artificial neural network ( ANN) classification. Long term spectrum analysis is proposed to gain better harmonic resolution, which reduces the spectrum interference between speech and noise. Next pitch candidates are extracted for each frame from the long term spectrum. Five specific features related to harmonic structure are computed for each candidate and combined together as a feature vector to indicate the status of each candidate. An ANN is trained to model the relation between the harmonic features and the true pitch values. In the test phase, target pitch is selected from the candidates according to the maximum output score from the ANN. Finally, post-processing is applied based on average segmental output to eliminate inconsistent or fluctuating decision errors. Experimental results show that the proposed algorithm outperforms several state-of-the-art methods for F0 estimation under adverse conditions, including white noise and multi-speaker babble noise.

Full Paper

Bibliographic reference.  Wang, Dongmei / Loizou, Philipos C. / Hansen, John H. L. (2014): "F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification", In INTERSPEECH-2014, 2258-2262.