ISCA Archive Interspeech 2015
ISCA Archive Interspeech 2015

Comparison of forced-alignment speech recognition and humans for generating reference VAD

Ivan Kraljevski, Zheng-Hua Tan, Maria Paola Bissiri

This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.


doi: 10.21437/Interspeech.2015-454

Cite as: Kraljevski, I., Tan, Z.-H., Bissiri, M.P. (2015) Comparison of forced-alignment speech recognition and humans for generating reference VAD. Proc. Interspeech 2015, 2937-2941, doi: 10.21437/Interspeech.2015-454

@inproceedings{kraljevski15_interspeech,
  author={Ivan Kraljevski and Zheng-Hua Tan and Maria Paola Bissiri},
  title={{Comparison of forced-alignment speech recognition and humans for generating reference VAD}},
  year=2015,
  booktitle={Proc. Interspeech 2015},
  pages={2937--2941},
  doi={10.21437/Interspeech.2015-454}
}