In this research we explore multimodal speech recognition by augmenting acoustic information with that obtained by an ultrasonic emitter and receiver. After designing a hardware component to generate a stereo audio/ultrasound signal, we extract sub-band ultrasonic features that supplement conventional MFCC-based audio measurements. A simple interpolation method is used to combine audio and ultrasound model likelihoods. Experiments performed on a noisy continuous digit recognition task indicate that the addition of ultrasonic information reduces word error rates by 24-29% over a wide range of acoustic SNR (20-0 dB).
Bibliographic reference. Zhu, Bo / Hazen, Timothy J. / Glass, James (2007): "Multimodal speech recognition with ultrasonic sensors", In INTERSPEECH-2007, 662-665.