This paper describes a new approach to unsupervised acoustic modeling, that is to build acoustic models for phoneme-like sub-word units from untranscribed speech data. The proposed approach is based on Gaussian component clustering. Initially a large set of Gaussian components are estimated from the untranscribed data. Then clustering is performed to group these Gaussian components into different clusters. Each cluster of Gaussian components forms an acoustic model for an induced sub-word unit. We have defined several similarity measures among the Gaussian components, and investigated several different graph-based clustering algorithms. Experiments on the TIMIT corpus demonstrate the effectiveness of our approach.
Bibliographic reference. Wang, Haipeng / Lee, Tan / Leung, Cheung-Chi / Ma, Bin / Li, Haizhou (2014): "A graph-based Gaussian component clustering approach to unsupervised acoustic modeling", In INTERSPEECH-2014, 875-879.