INTERSPEECH 2004  ICSLP

Recent work by Seltzer indicates that classical approaches to beamforming, minimizing output power while enforcing a distortionless constraint, do not yield optimal results in terms of word error rate (WER) on speech recognition task. This problem can be traced back to the mismatch between the target criterion of classical adaptive beamformers, which is optimization of the signal to noise ratio, and the actual target criterion, which is the reduction of the recognizer's WER. Following an approach by Seltzer we therefore investigate the performance of an alternative error criterion, which attempts to optimize the beamformer weights, so as to improve the likelihoods along the recognizer's Viterbi path for each utterance. This criterion matches the goal of lower WERs more closely and therefore leads to better recognition results.
Bibliographic reference. Raub, Dominik / McDonough, John / Wöfel, Matthias (2004): "A cepstral domain maximum likelihod beamformer for speech recognition", In INTERSPEECH2004, 817820.