11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Voice Activity Detection Based on Conditional Random Fields Using Multiple Features

Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Nagoya Institute of Technology, Japan

This paper proposes a Voice Activity Detection (VAD) algorithm based on Conditional Random Fields (CRF) using multiple features. VAD is a technique to distinguish between speech and non-speech in noisy environments and an important component in many real-world speech applications. In the proposed method,the posterior probability of output labels is directly modeled by the weighted sum of the feature functions. By estimating appropriate weight parameters, effective features are automatically selected for improving the performance for VAD. Experimental results on CENSREC-1-C database show that the proposed method can decrease error rates by using conditional random fields.

Full Paper

Bibliographic reference.  Saito, Akira / Nankaku, Yoshihiko / Lee, Akinobu / Tokuda, Keiichi (2010): "Voice activity detection based on conditional random fields using multiple features", In INTERSPEECH-2010, 2086-2089.