In this paper we present a new method of signal processing for robust speech recognition using multiple microphones. The method, loosely based on the human binaural hearing system, consists of passing the speech signals detected by multiple microphones through bandpass filtering and nonlinear halfwave rectification operations, and then cross-correlating the outputs from each channel within each frequency band. These operations provide rejection of off-axis interfering signals. These operations are repeated (in a non-physiological fashion) for the negative of the signal, and an estimate of the desired signal is obtained by combining the positive and negative outputs. We demonstrate that the use of this approach provides substantially better recognition accuracy than delay-and-sum beamforming using the same sensors for target signals in the presence of additive broadband and speech maskers. Improvements in reverberant environments are tangible but more modest.
Acoustic Examples(HTML file opening in a new window)
Bibliographic reference. Stern, Richard M. / Gouvêa, Evandro B. / Thattai, Govindarajan (2007): ""polyaural" array processing for automatic speech recognition in degraded environments", In INTERSPEECH-2007, 926-929.