We present a probabilistic framework that uses a bone sensor and air microphone to perform speech enhancement for robust speech recognition. The system exploits advantages of both sensors: the noise resistance of the bone sensor, and the linearity of the air microphone. In this paper we describe the general properties of the bone sensor relative to conventional air sensors. We propose a model capable of adapting to the noise conditions, and evaluate performance using a commercial speech recognition system. We demonstrate considerable improvements in recognition - from a baseline of 57% up to nearly 80% word accuracy - for four subjects on a difficult condition with background speaker interference.
Cite as: Hershey, J., Kristjansson, T., Zhang, Z. (2004) Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition. Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004), paper 139
@inproceedings{hershey04_sapa, author={John Hershey and Trausti Kristjansson and Zhengyou Zhang}, title={{Model-based fusion of bone and air sensors for speech enhancement and robust speech recognition}}, year=2004, booktitle={Proc. ITRW on Statistical and Perceptual Audio Processing (SAPA 2004)}, pages={paper 139} }