8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


An Optimized Multi-Duration HMM for Spontaneous Speech Recognition

Yuichi Ohkawa, Akihiro Yoshida, Motoyuki Suzuki, Akinori Ito, Shozo Makino

Tohoku University, Japan

In spontaneous speech, various speech style and speed changes can be observed, which are known to degrade speech recognition accuracy.

In this paper, we describe an optimized multi-duration HMM (OMD). An OMD is a kind of multi-path HMM with at most two parallel paths. Each path is trained using speech samples with short or long phoneme duration. The thresholds to divide samples of phonemes are determined through phoneme recognition experiment. Not only the thresholds but also topologies of HMM are determined using the recognition result.

Next, we parallelize OMD model with ordinary HMM trained by spontaneous speech and HMM trained by read speech in parallel. Using this `all-parallel' model, 19.3% reduction of word error rate was obtained compared with the ordinary HMM trained with spontaneous speech.

Full Paper

Bibliographic reference.  Ohkawa, Yuichi / Yoshida, Akihiro / Suzuki, Motoyuki / Ito, Akinori / Makino, Shozo (2003): "An optimized multi-duration HMM for spontaneous speech recognition", In EUROSPEECH-2003, 485-488.