INTERSPEECH 2007
8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Multimodal Speech Recognition with Ultrasonic Sensors

Bo Zhu, Timothy J. Hazen, James Glass

MIT, USA

In this research we explore multimodal speech recognition by augmenting acoustic information with that obtained by an ultrasonic emitter and receiver. After designing a hardware component to generate a stereo audio/ultrasound signal, we extract sub-band ultrasonic features that supplement conventional MFCC-based audio measurements. A simple interpolation method is used to combine audio and ultrasound model likelihoods. Experiments performed on a noisy continuous digit recognition task indicate that the addition of ultrasonic information reduces word error rates by 24-29% over a wide range of acoustic SNR (20-0 dB).

Full Paper

Bibliographic reference.  Zhu, Bo / Hazen, Timothy J. / Glass, James (2007): "Multimodal speech recognition with ultrasonic sensors", In INTERSPEECH-2007, 662-665.