VCTUBE : A Library for Automatic Speech Data Annotation

Seong Choi, Seunghoon Jeong, Jeewoo Yoon, Migyeong Yang, Minsam Ko, Eunil Park, Jinyoung Han, Munyoung Lee, Seonghee Lee


We introduce an open-source Python library, VCTUBE, which can automatically generate <audio, text> pair of speech data from a given Youtube URL. We believe VCTUBE is useful for collecting, processing, and annotating speech data easily toward developing speech synthesis systems.


Cite as: Choi, S., Jeong, S., Yoon, J., Yang, M., Ko, M., Park, E., Han, J., Lee, M., Lee, S. (2020) VCTUBE : A Library for Automatic Speech Data Annotation. Proc. Interspeech 2020, 1013-1014.


@inproceedings{Choi2020,
  author={Seong Choi and Seunghoon Jeong and Jeewoo Yoon and Migyeong Yang and Minsam Ko and Eunil Park and Jinyoung Han and Munyoung Lee and Seonghee Lee},
  title={{VCTUBE : A Library for Automatic Speech Data Annotation}},
  year=2020,
  booktitle={Proc. Interspeech 2020},
  pages={1013--1014}
}