ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Multi-source far-distance microphone selection and combination for automatic transcription of lectures

Matthias Wölfel, Christian Fügen, Shajith Ikbal, John W. McDonough

In this work, we present our progress in multi-source far field automatic speech-to-text transcription for lecture speech. In particular, we show how the best of several far field channels can be selected based on a signal-to-noise ratio criterion, and how the signals from multiple channels can be combined at either the waveform level using blind channel combination or at the hypothesis level using confusion network techniques to improve the accuracy of a far field lecture transcription system. Using the techniques described here, we ran a series of experiments on the test set used by the US National Institute of Standards and Technologies for the RT-05S evaluation. For the multiple distant microphones (MDM) task of RT-05S, our system achieved a word error rate of 38.5% which represents an improvement of over 13% absolute compared to the best reported results in the RT-05S evaluation.


doi: 10.21437/Interspeech.2006-122

Cite as: Wölfel, M., Fügen, C., Ikbal, S., McDonough, J.W. (2006) Multi-source far-distance microphone selection and combination for automatic transcription of lectures. Proc. Interspeech 2006, paper 1253-Mon2BuP.5, doi: 10.21437/Interspeech.2006-122

@inproceedings{wolfel06_interspeech,
  author={Matthias Wölfel and Christian Fügen and Shajith Ikbal and John W. McDonough},
  title={{Multi-source far-distance microphone selection and combination for automatic transcription of lectures}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1253-Mon2BuP.5},
  doi={10.21437/Interspeech.2006-122}
}