Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors

Kanru Hua


A F0 and voicing status estimation algorithm for high quality speech analysis/synthesis is proposed. This problem is approached from a different perspective that models the behavior of feature extractors under noise, instead of directly modeling speech signals. Under time-frequency locality assumptions, the joint distribution of extracted features and target F0 can be characterized by training a bank of Gaussian mixture models (GMM) on artificial data generated from Monte-Carlo simulations. The trained GMMs can then be used to generate a set of conditional distributions on the predicted F0, which are then combined and post-processed by Viterbi algorithm to give a final F0 trajectory. Evaluation on CSTR and CMU Arctic speech databases shows that the proposed method, trained on fully synthetic data, achieves lower gross error rates than state-of-the-art methods.


 DOI: 10.21437/Interspeech.2018-1258

Cite as: Hua, K. (2018) Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors. Proc. Interspeech 2018, 337-341, DOI: 10.21437/Interspeech.2018-1258.


@inproceedings{Hua2018,
  author={Kanru Hua},
  title={Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={337--341},
  doi={10.21437/Interspeech.2018-1258},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1258}
}