ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Voice activity detection in personal audio recordings using autocorrelogram compensation

Keansub Lee, Daniel P. W. Ellis

This paper presents a novel method for identifying regions of speech in the kinds of energetic and highly-variable noise present in ‘personal audio’ collected by body-worn continuous recorders. Motivated by psychoacoustic evidence that pitch is crucial in the perception and organization of sound, we use a noise-robust pitch detection algorithm to locate speech-like regions. To avoid false alarms resulting from background noise with strong periodic components (such as air-conditioning), we add a new channel selection scheme to suppress frequency subbands where the autocorrelation is more stationary than encountered in voiced speech. Quantitative evaluation shows that these harmonic noises are effectively removed by this compensation technique in the domain of auto-correlogram, and that detection performance is significantly better than existing algorithms for detecting the presence of speech in real-world personal audio recordings.


doi: 10.21437/Interspeech.2006-540

Cite as: Lee, K., Ellis, D.P.W. (2006) Voice activity detection in personal audio recordings using autocorrelogram compensation. Proc. Interspeech 2006, paper 1753-Wed3A1O.5, doi: 10.21437/Interspeech.2006-540

@inproceedings{lee06f_interspeech,
  author={Keansub Lee and Daniel P. W. Ellis},
  title={{Voice activity detection in personal audio recordings using autocorrelogram compensation}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1753-Wed3A1O.5},
  doi={10.21437/Interspeech.2006-540}
}