8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

A Cepstral Domain Maximum Likelihod Beamformer for Speech Recognition

Dominik Raub, John McDonough, Matthias Wöfel

Universitat Karlsruhe (TH), Germany

Recent work by Seltzer indicates that classical approaches to beamforming, minimizing output power while enforcing a distortionless constraint, do not yield optimal results in terms of word error rate (WER) on speech recognition task. This problem can be traced back to the mismatch between the target criterion of classical adaptive beamformers, which is optimization of the signal to noise ratio, and the actual target criterion, which is the reduction of the recognizer's WER. Following an approach by Seltzer we therefore investigate the performance of an alternative error criterion, which attempts to optimize the beamformer weights, so as to improve the likelihoods along the recognizer's Viterbi path for each utterance. This criterion matches the goal of lower WERs more closely and therefore leads to better recognition results.

Full Paper

Bibliographic reference.  Raub, Dominik / McDonough, John / Wöfel, Matthias (2004): "A cepstral domain maximum likelihod beamformer for speech recognition", In INTERSPEECH-2004, 817-820.