ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Robust speech recognition using VAD-measure-embedded decoder

Tasuku Oonishi, Paul R. Dixon, Koji Iwano, Sadaoki Furui

In a speech recognition system a Voice Activity Detector (VAD) is a crucial component for not only maintaining accuracy but also for reducing computational consumption. Front-end approaches which drop non-speech frames typically attempt to detect speech frames by utilizing speech/non-speech classification information such as the zero crossing rate or statistical models. These approaches discard the speech/non-speech classification information after voice detection. This paper proposes an approach that uses the speech/non-speech information to adjust the score of the recognition hypotheses. Experimental results show that our approach can improve the accuracy significantly and reduce computational consumption by combining the front-end method.

doi: 10.21437/Interspeech.2009-636

Cite as: Oonishi, T., Dixon, P.R., Iwano, K., Furui, S. (2009) Robust speech recognition using VAD-measure-embedded decoder. Proc. Interspeech 2009, 2239-2242, doi: 10.21437/Interspeech.2009-636

  author={Tasuku Oonishi and Paul R. Dixon and Koji Iwano and Sadaoki Furui},
  title={{Robust speech recognition using VAD-measure-embedded decoder}},
  booktitle={Proc. Interspeech 2009},