Conventional hidden Markov models (HMMs) have weak duration constraints. This may cause the decoder to produce word matches with unrealistic durations in noisy situations. This paper describes techniques for modelling context-dependent word duration cues and incorporating them directly in a multi-stack decoding algorithm. The proposed model is capable of penalising duration constraints of a word depending on its context. Experiments on connected digit recognition show that the new system can significantly improve recognition performance at different noise levels.
Cite as: Ma, N., Green, P. (2005) Context-dependent word duration modelling for robust speech recognition. Proc. Interspeech 2005, 2609-2612, doi: 10.21437/Interspeech.2005-241
@inproceedings{ma05_interspeech, author={Ning Ma and Phil Green}, title={{Context-dependent word duration modelling for robust speech recognition}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2609--2612}, doi={10.21437/Interspeech.2005-241} }