Interspeech'2005 - Eurospeech
This paper describes a speech enhancement system that significantly improves speech intelligibility of noisy speech in the context of a speech coder in low SNR conditions. The system uses two state-of-the-art non-acoustic sensors, a general electromagnetic motion sensor (GEMS) that detects the internal motions of glottis, and a physiological microphone (P-mic) that measures vibrations of the skin associated with speech. Both sensors are relatively immune to ambient acoustic noise, but provide incomplete information of speech. In the proposed system, the strengths of two algorithms , a perceptually motivated constant-Q (CQ) algorithm and an enhanced glottal correlation (GCORR) algorithm, are combined. The CQ algorithm employs a perceptually inspired signal detection technique to estimate the presence of speech cues in low SNR conditions. To reduce annoying artifacts, a state-dependent mechanism discriminating the distinct acoustic properties of each phoneme, and a psychoacoustic masking model are used to control enhancement gains. The enhanced glottal correlation algorithm extracts the desired speech signal from the noisy mixture, using a modified speech-GEMS correlation estimation of the speech signal with the glottal waveform supplied by GEMS. Both subjective and objective experiments were performed in a variety of noise conditions to indicate the improvement relative to the EMSR algorithm.
Bibliographic reference. Hu, Rongqiang / Kamath, Sunil D. / Anderson, David V. (2005): "Speech enhancement using non-acoustic sensors", In INTERSPEECH-2005, 2305-2308.