13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Group Sparse Hidden Markov Models for Speech Recognition

Jen-Tzung Chien, Cheng-Chun Chiang

Department of Electrical and Computer Engineering National Chiao Tung University, Hsinchu, Taiwan

This paper presents the group sparse hidden Markov models (GS-HMMs) where a sequence of acoustic features is driven by Markov chain and each feature vector is represented by two groups of basis vectors. The group of common bases represents the features across states within a HMM. The group of individual bases compensates the intra-state residual information. Importantly, the sparse prior for sensing weights is controlled by the Laplacian scale mixture (LSM) distribution which is obtained by multiplying Laplacian variable with an inverse Gamma variable. The scale mixture parameter in LSM makes the distribution even sparser. This parameter serves as an automatic relevance determination for selecting relevant bases from two groups. The weights and two sets of bases in GS-HMMs are estimated via Bayesian learning. We apply this framework for acoustic modeling and show the robustness of GS-HMMs for speech recognition in presence of different noises types and SNRs.

Index Terms: Bayesian learning, group sparsity, hidden Markov model, speech recognition

Full Paper

Bibliographic reference.  Chien, Jen-Tzung / Chiang, Cheng-Chun (2012): "Group sparse hidden Markov models for speech recognition", In INTERSPEECH-2012, 2646-2649.