16th Annual Conference of the International Speech Communication Association

Dresden, Germany
September 6-10, 2015

Comparison of Forced-Alignment Speech Recognition and Humans for Generating Reference VAD

Ivan Kraljevski (1), Zheng-Hua Tan (2), Maria Paola Bissiri (3)

(1) voice INTER connect, Germany
(2) Aalborg University, Denmark
(3) Queen Margaret University, UK

This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.

Full Paper

Bibliographic reference.  Kraljevski, Ivan / Tan, Zheng-Hua / Bissiri, Maria Paola (2015): "Comparison of forced-alignment speech recognition and humans for generating reference VAD", In INTERSPEECH-2015, 2937-2941.