ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Bayesian Acoustic Modeling for Spontaneous Speech Recognition

Shinji Watanabe, Yasuhiro Minami, Atsushi Nakamura, Naonori Ueda

NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan

Accurate acoustic model construction for spontaneous speech recognition requires that various speech fluctuation factors such as speaking variations and speaker variances are dealt with. The Bayesian approach has advantages for the speech fluctuation modeling because it enables an appropriate model selection for given speech data, unlike the maximum likelihood approach. However, the Bayesian approach includes complicated integrals that have prevented it from being realized in a large-scale task such as spontaneous speech recognition. In this paper, we apply a practical Bayesian framework: Variational Bayesian Estimation and Clustering for speech recognition (VBEC), to spontaneous speech recognition. In particular, we focus on the selection of an appropriate acoustic model structure. The effectiveness of the VBEC is shown through recognition experiments using real spontaneous speech data.

Full Paper

Bibliographic reference.  Watanabe, Shinji / Minami, Yasuhiro / Nakamura, Atsushi / Ueda, Naonori (2003): "Bayesian acoustic modeling for spontaneous speech recognition", in SSPR-2003, paper MAP3.