Conventional HMMs have weak duration constraints. In noisy conditions, the mismatch between corrupted speech signals and models trained on clean speech may cause the decoder to produce word matches with unrealistic durations. This paper presents a simple way to incorporate word duration constraints by unrolling HMMs to form a lattice where word duration probabilities can be applied directly to state transitions. The expanded HMMs are compatible with conventional Viterbi decoding. Experiments on connected-digit recognition show that when using explicit duration constraints the decoder generates word matches with more reasonable durations, and word error rates are significantly reduced across a broad range of noise conditions.
Bibliographic reference. Ma, Ning / Barker, Jon / Green, Phil (2007): "Applying word duration constraints by using unrolled HMMs", In INTERSPEECH-2007, 1066-1069.