Interest within the automatic speech recognition (ASR) research community has recently focused on the recognition of speech captured with a microphone located in the medium field, rather than being mounted on a headset and positioned next to the speaker's mouth. The capacity to recognize such speech is a primary requirement in making ASR a viable modality for so-called ubiquitous computing. This is a natural application for multiple microphones whose signals can be combined in different ways: On the signal side, combination can be accomplished by beamforming techniques using a microphone array or by blind source separation. On the word hypothesis side, combination can be achieved through confusion network combination. In this work, we compare the effectiveness of the several combination techniques, and compare their performance to that achieved with a close talking microphone.
Cite as: Wölfel, M., McDonough, J. (2005) Combining multi-source far distance speech recognition strategies: beamforming, blind channel and confusion network combination. Proc. Interspeech 2005, 3149-3152, doi: 10.21437/Interspeech.2005-270
@inproceedings{wolfel05b_interspeech, author={Matthias Wölfel and John McDonough}, title={{Combining multi-source far distance speech recognition strategies: beamforming, blind channel and confusion network combination}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={3149--3152}, doi={10.21437/Interspeech.2005-270} }