11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Non-Audible Murmur Recognition Based on Fusion of Audio and Visual Streams

Panikos Heracleous, Norihiro Hagita

ATR-IRC, Japan

Non-Audible Murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with the use of special acoustic sensors (i.e., NAM microphones) attached behind the talker's ear. In a NAM microphone, body transmission and loss of lip radiation act as a low-pass filter. Consequently, higher frequency components are attenuated in a NAM signal. Owing to such factors as spectral reduction, the unvoiced nature of NAM, and the type of articulation, the NAM sounds become similar, thereby causing a larger number of confusions in comparison to normal speech. In the present article, the visual information extracted from the talker's facial movements is fused with NAM speech using three fusion methods, and phoneme classification experiments are conducted. The experimental results reveal a significant improvement when both fused NAM speech and facial information are used.

Full Paper

Bibliographic reference.  Heracleous, Panikos / Hagita, Norihiro (2010): "Non-audible murmur recognition based on fusion of audio and visual streams", In INTERSPEECH-2010, 2706-2709.