In this paper, we propose a computational auditory scene analysis (CASA)–based front–end for two–microphone speech recognition in a car environment. One of the important issues associated with CASA is the accurate estimation of mask information for target speech separation within multiple microphone noisy speech. For such a task, the time–frequency mask information is compensated through the signal–to–noise ratio resulted from a beamformer to adjust the noise quantity included in noisy speech. We evaluate the performance of an automatic speech recognition system employing a CASA–based front–end with the proposed mask compensation method. Then, we compare its performance with those employing a CASA–based front–end without mask compensation and the beamforming–based front–end. As a result, the CASA–based front–end with the proposed method achieves relative WER reductions of 26.52% and 8.57%, compared that the beamformer and a CASA–based front–end alone, respectively.
Bibliographic reference. Park, Ji Hun / Kim, Seon Man / Yoon, Jae Sam / Kim, Hong Kook / Lee, Sung Joo / Lee, Yunkeun (2010): "SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment", In INTERSPEECH-2010, 725-728.