This paper presents a model-based voice activity detector (VAD) aimed at operating in low signal to noise ratio conditions and non-stationary noise environments. The proposed system makes use of Gaussian mixture models trained on Mel Frequency Cepstral Coefficients extracted from noisy speech data. In addition, information from smoothed frame based log energy is used to augment the system to detect voice activity accurately. Finally, preliminary decisions made by the system are post processed to remove some false acceptances which further improves the system performance. Experimental results show that the proposed VAD significantly outperforms the system that currently produces state-of-the-art results on the QUT-NOISE-TIMIT database with relative improvements of 34.58%, 17.18% and 3.5% for high, medium and low signal to noise ratio scenarios respectively.
Bibliographic reference. Sriskandaraja, Kaavya / Sethu, Vidhyasaharan / Le, Phu Ngoc / Ambikairajah, Eliathamby (2015): "A model based voice activity detector for noisy environments", In INTERSPEECH-2015, 2297-2301.