International Workshop on Hands-Free Speech Communication (HSC2001)

April 9-11, 2001
Kyoto, Japan

Voice Activity Detection Using Non-Speech Models and HMM Composition

Takeshi Yamada (1), Narimasa Watanabe (1), Futoshi Asano (2), Nobuhiko Kitawaki (1)

University of Tsukuba, Japan
(2) Electrotechnical Laboratory, Tsukuba, Japan

To realize a robust voice activity detection (VAD) method in real acoustic environments, this paper proposes a new VAD method using non-speech (environment sound) models and HMM composition. The proposed method predicts the environment sound that overlaps with the speech, then composes the speech model and the model of the predicted environment sound and detects the mixture sound period by using the composed models. In the proposed method, an efficient and reliable search is realized by restricting the number of combinations of the speech model and the environment sound models. To evaluate the performance of the proposed method, experiments were conducted. These results confirmed that the proposed method can effectively detect the mixture sound period.


Full Paper

Bibliographic reference.  Yamada, Takeshi / Watanabe, Narimasa / Asano, Futoshi / Kitawaki, Nobuhiko (2001): "Voice activity detection using non-speech models and HMM composition", In HSC2001, 131-134.