Voice activity detection (VAD) uses a representation of speech derived from
spectrum analysis, followed by statistical characterization of speech and degrading
noise. Features derived using traditional methods may not be adequate for VAD
in the case of transient noises. In this paper, we focus on transient noises
where most of the VAD systems in literature do not perform well. A high temporal
resolution and high frequency resolution representation is used to discriminate
the transient noises from speech.
The high temporal and frequency resolution representation is achieved by filtering the signal at several single frequencies. The single frequency filtering approach helps to isolate the regions of transient noise in a signal. A time varying threshold is proposed based on the spectral variance and the temporal variance of the speech signal to detect transient noise. The remaining regions are processed by the spectral variance measure for VAD. The results have been compared to the Adaptive Multi-rate (AMR) methods. The performance of proposed method is consistently better due to the instantaneous feature. The percentage of detection of transient noise is higher for the proposed method than the methods reported in the literature.
Bibliographic reference. Aneeja, G. / Yegnanarayana, B. (2014): "Speech detection in transient noises", In INTERSPEECH-2014, 2356-2360.