10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

A Study of Mutual Front-End Processing Method Based on Statistical Model for Noise Robust Speech Recognition

Masakiyo Fujimoto, Kentaro Ishizuka, Tomohiro Nakatani

NTT Corporation, Japan

This paper addresses robust front-end processing for automatic speech recognition (ASR) in noise. Accurate recognition of corrupted speech requires noise robust front-end processing, e.g., voice activity detection (VAD) and noise suppression (NS). Typically, VAD and NS are combined as one-way processing, and are developed independently. However, VAD and NS should not be assumed to be independent techniques, because sharing each others’ information is important for the improvement of front-end processing. Thus, we investigate the mutual front-end processing by integrating VAD and NS, which can beneficially share each others’ information. In an evaluation of a concatenated speech corpus, CENSREC-1-C database, the proposed method improves the performance of both VAD and ASR compared with the conventional method.

Full Paper

Bibliographic reference.  Fujimoto, Masakiyo / Ishizuka, Kentaro / Nakatani, Tomohiro (2009): "A study of mutual front-end processing method based on statistical model for noise robust speech recognition", In INTERSPEECH-2009, 1235-1238.