COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction
University of East Anglia, Norwich, UK
This paper intends to summarize recent developments and experimental results related to Automatic Speech Recognition (ASR) using signals captured with a throat-microphone. Due to the proximity of the sensor to the voice source, the signal is naturally less subject to background noise. This however yields speech sounds that have different frequency contents than with traditional microphones, and requires having specific acoustic models. We propose to use the information from both signals by combining the probability vectors provided by both acoustic models.
The systems are evaluated on a connected digit recognition task in French. A database has been recorded for both training the acoustic models and for testing the whole setup. It contains both throat and .ordinary. close-talk signals. To avoid any possibly unrealistic assumption on the effect of noise on each signal, the test portion has been acquired using a background noise played back through loudspeakers.
The ASR experiments that we achieved demonstrate the benefit of using alternative microphones. Relative recognition improvements as high as 80% were obtained on sequences of digits recorded in loud musical environment.
Bibliographic reference. Dupont, Stéphane / Ris, Christophe / Bachelart, Damien (2004): "Combined use of close-talk and throat microphones for improved speech recognition under non-stationary background noise", In Robust2004, paper 31.