INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Evaluation of Voice Activity and Voicing Detection

Bojan Kotnik (1), Pierre Sendorek (2), Sergey Astrov (3), Turgay Koc (4), Tolga Ciloglu (4), Laura Docío Fernández (5), Eduardo Rodríguez Banga (5), Harald Höge (3), Zdravko Kačič (1)

(1) University of Maribor, Slovenia; (2) Telecom Paristech, France; (3) Siemens AG, Germany; (4) Middle East Technical University, Turkey; (5) University of Vigo, Spain

This paper describes the ECESS evaluation campaign of voice activity and voicing detection. Standard VAD classifies signal into speech and non-speech, we extend it to VAD+ so that it classifies a signal as a sequence of non-speech, voiced and unvoiced segments. The evaluation is performed on a portion of the Spanish SPEECON database with manually labeled segmentation. To avoid errors caused by the limited precision of manual labeling we introduce "dead zones" - tolerance intervals +-5 ms around label changes in the data set. In these tolerance intervals we don't evaluate the signal.

Full Paper

Bibliographic reference.  Kotnik, Bojan / Sendorek, Pierre / Astrov, Sergey / Koc, Turgay / Ciloglu, Tolga / Fernández, Laura Docío / Banga, Eduardo Rodríguez / Höge, Harald / Kačič, Zdravko (2008): "Evaluation of voice activity and voicing detection", In INTERSPEECH-2008, 1642-1645.