8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Probabilistic Latent Speaker Analysis for Large Vocabulary Speech Recognition

Dan Su, Xihong Wu, Huisheng Chi

Peking University, China

Trajectory folding problem is intrinsic for HMM-based speech recognition systems in which each state is modeled by a mixture of Gaussian components. In this paper, a probabilistic latent semantic analysis (PLSA)-based approach is proposed for use in speech recognition systems to alleviate this problem. The basic idea is that different speech trajectories are strongly correlated with speaker variation, and different speakers may have high scores on certain Gaussian components consistently. Thus, PLSA is adopted to perform co-occurrence analysis between Gaussian components and speakers and provide additional source of information to constrain searching path during decoding procedure. Experimental results show that 11.2% and 2.7% relative reduction on word error rate can be achieved on a homogeneous test set and the 2004 863 evaluation set, respectively.

Full Paper

Bibliographic reference.  Su, Dan / Wu, Xihong / Chi, Huisheng (2007): "Probabilistic latent speaker analysis for large vocabulary speech recognition", In INTERSPEECH-2007, 1162-1165.