9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Study of Integration of Statistical Model-Based Voice Activity Detection and Noise Suppression

Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani

NTT Corporation, Japan

This paper addresses robust front-end processing for automatic speech recognition (ASR) in noisy environments. To recognize the corrupted speech accurately, it is necessary to employ robust methods against various types of interference. Usually, noise suppression (NS) is used for the front-end processing of ASR in noise. Voice activity detection (VAD) is also used for front-end processing to reduce the redundant non-speech period. VAD and NS are typically combined as series processing. However, VAD and NS should not be assumed to be a separate technique, because the output information of these methods be mutually beneficial. Thus, we investigate the integrated front-end processing of VAD and NS, which can utilize each others' input-output information. The evaluation is carried out by using a concatenated speech corpus, CENSREC-1-C. In the evaluation, the proposed method improves ASR accuracy compared with conventional series combination.

Full Paper

Bibliographic reference.  Fujimoto, Masakiyo / Ishizuka, Kentaro / Nakatani, Tomohiro (2008): "Study of integration of statistical model-based voice activity detection and noise suppression", In INTERSPEECH-2008, 2008-2011.