In a speech recognition system a Voice Activity Detector (VAD) is a crucial component for not only maintaining accuracy but also for reducing computational consumption. Front-end approaches which drop non-speech frames typically attempt to detect speech frames by utilizing speech/non-speech classification information such as the zero crossing rate or statistical models. These approaches discard the speech/non-speech classification information after voice detection. This paper proposes an approach that uses the speech/non-speech information to adjust the score of the recognition hypotheses. Experimental results show that our approach can improve the accuracy significantly and reduce computational consumption by combining the front-end method.
Bibliographic reference. Oonishi, Tasuku / Dixon, Paul R. / Iwano, Koji / Furui, Sadaoki (2009): "Robust speech recognition using VAD-measure-embedded decoder", In INTERSPEECH-2009, 2239-2242.