ISCA Archive Interspeech 2009
ISCA Archive Interspeech 2009

Unsupervised training of an HMM-based speech recognizer for topic classification

Herbert Gish, Man-hung Siu, Arthur Chan, Bill Belfield

HMM-based Speech-To-Text (STT) systems are widely deployed not only for dictation tasks but also as the first processing stage of many automatic speech applications such as spoken topic classification. However, the necessity of transcribed data for training the HMMs precludes its use in domains where transcribed speech is difficult to come by because of the specific domain, channel or language. In this work, we propose building HMM-based speech recognizers without transcribed data by formulating the HMM training as an optimization over both the parameter and transcription sequence space. We describe how this can be easily implemented using existing STT tools. We tested the effectiveness of our unsupervised training approach on the task of topic classification on the Switchboard corpus. The unsupervised HMM recognizer, initialized with a segmental tokenizer, outperformed both the a HMM phoneme recognizer trained with 1 hour of transcribed data, and the Brno University of Technology (BUT) Hungarian phoneme recognizer. This approach can also be applied to other speech applications, including spoken term detection, language and speaker verification.

doi: 10.21437/Interspeech.2009-559

Cite as: Gish, H., Siu, M.-h., Chan, A., Belfield, B. (2009) Unsupervised training of an HMM-based speech recognizer for topic classification. Proc. Interspeech 2009, 1935-1938, doi: 10.21437/Interspeech.2009-559

  author={Herbert Gish and Man-hung Siu and Arthur Chan and Bill Belfield},
  title={{Unsupervised training of an HMM-based speech recognizer for topic classification}},
  booktitle={Proc. Interspeech 2009},