Automatic speech recognition in real world situations often requires the use of microphones distant from speaker's mouth. One or several microphones are placed in the surroundings to capture many versions of the original signal. Recognition with a single far field microphone yields considerably poorer performance than with person-mounted devices (headset, lapel), with the main causes being reverberation and noise. Acoustic beam-forming techniques allow significant improvements over the use of a single microphone, although the overall performance still remains well above the close-talking results. In this paper we investigate the use of beam-forming in the context of speaker movement, together with commonly used adaptation techniques and compare against a naive multi-stream approach. We show that even such a simple approach can yield equivalent results to beam-forming, allowing for far more powerful integration of multiple microphone sources in ASR systems.
Bibliographic reference. Marino, Davide / Hain, Thomas (2011): "An analysis of automatic speech recognition with multiple microphones", In INTERSPEECH-2011, 1281-1284.