8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Noise Robust Front-End Processing with Voice Activity Detection Based on Periodic to Aperiodic Component Ratio

Kentaro Ishizuka, Tomohiro Nakatani, Masakiyo Fujimoto, Noboru Miyazaki

NTT Corporation, Japan

This paper proposes a front-end processing method for automatic speech recognition (ASR) that employs a voice activity detection (VAD) method based on the periodic to aperiodic component ratio (PAR). The proposed VAD method is called PARADE (PAR based Activity DEtection). By considering the powers of the periodic and aperiodic components of the observed signals simultaneously, PARADE can detect speech segments more precisely in the presence of noise than conventional VAD methods. In this paper, PARADE is applied to a front-end processing technique that employs a robust feature extraction method called SPADE (Subband based Periodicity and Aperiodicity DEcomposition). The noisy ASR performance was examined with the CENSREC-1-C database, which includes connected continuous digit speech utterances drawn from CENSREC-1 (Japanese version of AURORA-2). The result shows that the SPADE front-end combined with PARADE achieves average word accuracy of 74.22% at signal to noise ratios of 0 to 20 dB. This accuracy is significantly higher than that achieved by the ETSI ES 202 050 front-end (63.66%) and the SPADE front-end without PARADE (64.28%). This result also confirmed that PARADE can improve the performance of front-end processing.

Full Paper

Bibliographic reference.  Ishizuka, Kentaro / Nakatani, Tomohiro / Fujimoto, Masakiyo / Miyazaki, Noboru (2007): "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio", In INTERSPEECH-2007, 230-233.