15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Selection of Optimal Vocal Tract Regions Using Real-Time Magnetic Resonance Imaging for Robust Voice Activity Detection

Abhay Prasad (1), Prasanta Kumar Ghosh (2), Shrikanth S. Narayanan (3)

(1) Manipal Institute of Technology, India
(2) Indian Institute of Science, India
(3) University of Southern California, USA

Real time magnetic resonance imaging (rtMRI) enables direct video capture of the moving vocal tract concurrent with audio signal providing valuable data for speech research. We consider a multimodal approach to voice activity detection (VAD) in the rtMRI recording that uses audio signal as well as MRI image sequence. The degraded quality of the audio recorded in the scanner motivates this multimodal scheme for robust VAD. Optimal regions in the MRI image are selected for performing VAD with a novel algorithm. VAD experiments using rtMRI data of two male and two female subjects show that VAD performance using optimally selected regions from MRI images is comparable to that using only audio signal. The optimal regions turn out to be parts of jaw, velum, glottis and lips. VAD performance using audio signal and MRI image sequence together is found to be significantly better (~14% absolute improvement in VAD accuracy) than that using the audio only when the audio is contaminated with additive noise at low SNR.

Full Paper

Bibliographic reference.  Prasad, Abhay / Ghosh, Prasanta Kumar / Narayanan, Shrikanth S. (2014): "Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection", In INTERSPEECH-2014, 1539-1543.