EUROSPEECH 2003 - INTERSPEECH 2003
In spontaneous speech, various speech style and speed changes can be observed, which are known to degrade speech recognition accuracy.
In this paper, we describe an optimized multi-duration HMM (OMD). An OMD is a kind of multi-path HMM with at most two parallel paths. Each path is trained using speech samples with short or long phoneme duration. The thresholds to divide samples of phonemes are determined through phoneme recognition experiment. Not only the thresholds but also topologies of HMM are determined using the recognition result.
Next, we parallelize OMD model with ordinary HMM trained by spontaneous speech and HMM trained by read speech in parallel. Using this `all-parallel' model, 19.3% reduction of word error rate was obtained compared with the ordinary HMM trained with spontaneous speech.
Bibliographic reference. Ohkawa, Yuichi / Yoshida, Akihiro / Suzuki, Motoyuki / Ito, Akinori / Makino, Shozo (2003): "An optimized multi-duration HMM for spontaneous speech recognition", In EUROSPEECH-2003, 485-488.