In this paper, we proposed a speaker-dependent VAD algorithm that extract speech period uttered by a target user only. Based on our survey on recognition error of a real speech data collected in "VoiceTra" that is a speech-to-speech translation system for smart phones, we found a lot of word insertion errors caused by background speakers' speech. Our VAD that consists of the three GMMs (noise GMM and speech GMM as used in traditional GMM-based VAD, and speaker adapted GMM) can be easily used for speech detection of the target speaker. Experiments using test utterances with background speakers' speech demonstrated that an ASR system using our proposed VAD achieved better ASR performance compared with an ASR system using the conventional VAD.
Index Terms: voice activity detection, speech recognition
Bibliographic reference. Matsuda, Shigeki / Ito, Naoya / Tsujino, Kosuke / Kashioka, Hideki / Sagayama, Shigeki (2012): "Speaker-dependent voice activity detection robust to background speech noise", In INTERSPEECH-2012, 2626-2629.