16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Minimum Word Error Training of RNN-Based Voice Activity Detection

Gregory Gelly, Jean-Luc Gauvain

LIMSI, France

Voice Activity Detection (VAD) is critical in speech recognition systems as it can dramatically impact the recognition accuracy especially on noisy data. This paper presents a novel method which applies Minimum Word Error (MWE) training to a Long Short-Term Memory RNN to optimize Voice Activity Detection for speech recognition. Experiments compare speech recognition WERs using RNN VAD with other commonly used VAD methods for two corpora: the conversational Vietnamese corpus used in the NIST OpenKWS13 evaluation and a corpus of French telephone conversations. The proposed VAD method combining MWE training with RNN yields the best ASR results. This MWE training scheme appears to be particularly useful for low resource ASR tasks, as exemplified by the IARPA BABEL data.

Full Paper

Bibliographic reference.  Gelly, Gregory / Gauvain, Jean-Luc (2015): "Minimum word error training of RNN-based voice activity detection", In INTERSPEECH-2015, 2650-2654.