10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

Robust Speech Recognition Using VAD-Measure-Embedded Decoder

Tasuku Oonishi (1), Paul R. Dixon (1), Koji Iwano (2), Sadaoki Furui (1)

(1) Tokyo Institute of Technology, Japan
(2) Tokyo City University, Japan

In a speech recognition system a Voice Activity Detector (VAD) is a crucial component for not only maintaining accuracy but also for reducing computational consumption. Front-end approaches which drop non-speech frames typically attempt to detect speech frames by utilizing speech/non-speech classification information such as the zero crossing rate or statistical models. These approaches discard the speech/non-speech classification information after voice detection. This paper proposes an approach that uses the speech/non-speech information to adjust the score of the recognition hypotheses. Experimental results show that our approach can improve the accuracy significantly and reduce computational consumption by combining the front-end method.

Full Paper

Bibliographic reference.  Oonishi, Tasuku / Dixon, Paul R. / Iwano, Koji / Furui, Sadaoki (2009): "Robust speech recognition using VAD-measure-embedded decoder", In INTERSPEECH-2009, 2239-2242.