8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Time Adjustable Mixture Weights for Speaking Rate Fluctuation

Takahiro Shinozaki, Sadaoki Furui

Tokyo Institute of Technology, Japan

One of the most serious problems in spontaneous speech recognition is the degradation of recognition accuracy due to the speaking rate fluctuation in an utterance. This paper proposes a method for adjusting mixture weights of an HMM frame by frame depending on the local speaking rate. The proposed method is implemented using the Bayesian network framework. A hidden variable representing the variation of the "mode" of the speaking rate is introduced and its value controls the mixture weights of Gaussian mixtures. Model training and maximum probability assignment of the variables are conducted using the EM/GEM and inference algorithms for Bayesian networks. The Bayesian network is used to rescore the acoustic likelihood of the hypotheses in N-best lists. Experimental results show that the proposed method improves word accuracy by 1.6% for the absolute value on meeting speech given the speaking rate information, whereas improvement by a regression HMM is less significant.

Full Paper

Bibliographic reference.  Shinozaki, Takahiro / Furui, Sadaoki (2003): "Time adjustable mixture weights for speaking rate fluctuation", In EUROSPEECH-2003, 973-976.