ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Fundamental frequency and voicing prediction from MFCCs for speech reconstruction from unconstrained speech

Ben Milner, Xu Shao, Jonathan Darch

This work proposes a method to predict the fundamental frequency and voicing of a frame of speech from its MFCC representation. This has particular use in distributed speech recognition systems where the ability to predict fundamental frequency and voicing allows a time-domain speech signal to be reconstructed solely from the MFCC vectors. Prediction is achieved by modeling the joint density of MFCCs and fundamental frequency with a combined hidden Markov model-Gaussian mixture model (HMM-GMM) framework. Prediction results are presented on unconstrained speech using both a speaker-dependent database and a speaker-independent database. Spectrogram comparisons of the reconstructed and original speech are also made. The results show for the speaker-dependent task a percentage fundamental frequency prediction error of 3.1% is made while for the speaker-independent task this rises to 8.3%.


doi: 10.21437/Interspeech.2005-174

Cite as: Milner, B., Shao, X., Darch, J. (2005) Fundamental frequency and voicing prediction from MFCCs for speech reconstruction from unconstrained speech. Proc. Interspeech 2005, 321-324, doi: 10.21437/Interspeech.2005-174

@inproceedings{milner05_interspeech,
  author={Ben Milner and Xu Shao and Jonathan Darch},
  title={{Fundamental frequency and voicing prediction from MFCCs for speech reconstruction from unconstrained speech}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={321--324},
  doi={10.21437/Interspeech.2005-174}
}