ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

The semi-supervised switchboard transcription project

Amarnag Subramanya, Jeff Bilmes

In previous work, we proposed a new graph-based semi-supervised learning (SSL) algorithm and showed that it outperforms other state-of-the-art SSL approaches for classifying documents and web-pages. Here we use a multi-threaded implementation in order to scale the algorithm to very large data sets. We treat the phonetically annotated portion of the Switchboard transcription project (STP) as labeled data and automatically annotate (at the phonetic level) the Switchboard I (SWB) training set and show that our proposed approach outperforms state-of-the-art SSL algorithms as well as a state-of-the-art strictly supervised classifier. As a result, we have STP-style annotations of the entire SWB-I training set which we refer to as semi-supervised STP (S3TP).

doi: 10.21437/Interspeech.2009-554

Cite as: Subramanya, A., Bilmes, J. (2009) The semi-supervised switchboard transcription project. Proc. Interspeech 2009, 1915-1918, doi: 10.21437/Interspeech.2009-554

  author={Amarnag Subramanya and Jeff Bilmes},
  title={{The semi-supervised switchboard transcription project}},
  booktitle={Proc. Interspeech 2009},