ISCA Archive ISCSLP 2002
ISCA Archive ISCSLP 2002

An algorithm for voiced / unvoiced decision and pitch estimation in speech feature extraction

Dong Wang, Yi-Ning Chen, Jia Liu

An algorithm which combines voice / unvoiced decision and pitch estimation is proposed in an enhanced process of MFCC feature extraction. The residual energy of LPC analysis and normalized autocorrelation are calculated and the static and dynamic thresholds are set for the voiced, unvoiced and transitional decision. Thus speech is divided into three classes that are voiced, unvoiced and transitional. Then the pitch is estimated by a dynamic programming (DP) algorithm. In the following harmonic peak picking process, the result is refined by the additional spectral information. The algorithm is empowered by the finite state machine (FSM) embedded in U/V decision which can convert the static thresholds to dynamical variable thresholds and represent the actual speech more exactly. Experiments also show that performance gains of word recognition rate from 71.49% to 74.42% in the National 863 standard Mandarin speech Corpus.


Cite as: Wang, D., Chen, Y.-N., Liu, J. (2002) An algorithm for voiced / unvoiced decision and pitch estimation in speech feature extraction. Proc. International Symposium on Chinese Spoken Language Processing, paper 35

@inproceedings{wang02l_iscslp,
  author={Dong Wang and Yi-Ning Chen and Jia Liu},
  title={{An algorithm for voiced / unvoiced decision and pitch estimation in speech feature extraction}},
  year=2002,
  booktitle={Proc. International Symposium on Chinese Spoken Language Processing},
  pages={paper 35}
}