5th International Conference on Spoken Language Processing
In this paper a method of integrating a model of suprasegmental duration with a HMM-based recogniser at the post-processing level is presented. The N-Best utterance output is rescored using a suitable linear combination of acoustic log-likelihood (provided by a set of tied-state triphone HMMs) and duration log-likelihood (provided by a set of durational models). The durational model used in the post-processing imposes syllable-level elastic constraints on the durational behaviour of speech segments. Results are presented for word accuracy on the Resource Management database after rescoring, using two different syllable-like constraint units, a fixed-size N-phone window and simple (no constraint) phone duration probability scoring.
Bibliographic reference. Molloy, Laurence / Isard, Stephen (1998): "Suprasegmental duration modelling with elastic constraints in automatic speech recognition ", In ICSLP-1998, paper 1103.