11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

SNR-Based Mask Compensation for Computational Auditory Scene Analysis Applied to Speech Recognition in a Car Environment

Ji Hun Park (1), Seon Man Kim (1), Jae Sam Yoon (1), Hong Kook Kim (1), Sung Joo Lee (2), Yunkeun Lee (2)

(1) GIST, Korea
(2) ETRI, Korea

In this paper, we propose a computational auditory scene analysis (CASA)–based front–end for two–microphone speech recognition in a car environment. One of the important issues associated with CASA is the accurate estimation of mask information for target speech separation within multiple microphone noisy speech. For such a task, the time–frequency mask information is compensated through the signal–to–noise ratio resulted from a beamformer to adjust the noise quantity included in noisy speech. We evaluate the performance of an automatic speech recognition system employing a CASA–based front–end with the proposed mask compensation method. Then, we compare its performance with those employing a CASA–based front–end without mask compensation and the beamforming–based front–end. As a result, the CASA–based front–end with the proposed method achieves relative WER reductions of 26.52% and 8.57%, compared that the beamformer and a CASA–based front–end alone, respectively.

Full Paper

Bibliographic reference.  Park, Ji Hun / Kim, Seon Man / Yoon, Jae Sam / Kim, Hong Kook / Lee, Sung Joo / Lee, Yunkeun (2010): "SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment", In INTERSPEECH-2010, 725-728.