10th Annual Conference of the International Speech Communication Association

Brighton, United Kingdom
September 6-10, 2009

The Semi-Supervised Switchboard Transcription Project

Amarnag Subramanya, Jeff Bilmes

University of Washington, USA

In previous work, we proposed a new graph-based semi-supervised learning (SSL) algorithm and showed that it outperforms other state-of-the-art SSL approaches for classifying documents and web-pages. Here we use a multi-threaded implementation in order to scale the algorithm to very large data sets. We treat the phonetically annotated portion of the Switchboard transcription project (STP) as labeled data and automatically annotate (at the phonetic level) the Switchboard I (SWB) training set and show that our proposed approach outperforms state-of-the-art SSL algorithms as well as a state-of-the-art strictly supervised classifier. As a result, we have STP-style annotations of the entire SWB-I training set which we refer to as semi-supervised STP (S3TP).

Full Paper

Bibliographic reference.  Subramanya, Amarnag / Bilmes, Jeff (2009): "The semi-supervised switchboard transcription project", In INTERSPEECH-2009, 1915-1918.