9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

Automatically Learning Speaker-Independent Acoustic Subword Units

Balakrishnan Varadarajan, Sanjeev Khudanpur

Johns Hopkins University, USA

We investigate methods for unsupervised learning of sub-word acoustic units of a language directly from speech. Earlier we demonstrated that the states of a hidden Markov model "grown" using a novel modification of the maximum likelihood successive state splitting algorithm correspond very well with the phones of the language[1]. The correspondence between the Viterbi state sequence for unseen speech from the training speaker and the phone transcription of the speech is over 85%, and generalizes to a large extent (~ 61%) to speech from a different speaker. Furthermore, we are able to bridge more than half the gap between the speaker-dependent and cross-speaker correspondence of the automatically learned units to phones (~ 73% accuracy) by unsupervised adaptation via MLLR.


  1. 1] B. Varadarajan, S. Khudanpur, and E. Dupoux, "Unsupervised learning of acoustic sub-word units," in Proceedings of ACL-08: HLT, Short Papers. Columbus, Ohio: Association for Computational Linguistics, June 2008, pp. 165-168. [Online]. Available:

Full Paper

Bibliographic reference.  Varadarajan, Balakrishnan / Khudanpur, Sanjeev (2008): "Automatically learning speaker-independent acoustic subword units", In INTERSPEECH-2008, 1333-1336.