8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Duration Normalization and Hypothesis Combination for Improved Spontaneous Speech Recognition

Jon P. Nedel, Richard M. Stern

Carnegie Mellon University, USA

When phone segmentations are known a priori, normalizing the duration of each phone has been shown to be effective in overcoming weaknesses in duration modeling of Hidden Markov Models (HMMs). While we have observed potential relative reductions in word error rate (WER) of up to 34.6% with oracle segmentation information, it has been difficult to achieve significant improvement in WER with segmentation boundaries that are estimated blindly. In this paper, we present simple variants of our duration normalization algorithm, which make use of blindly-estimated segmentation boundaries to produce different recognition hypotheses for a given utterance. These hypotheses can then be combined for significant improvements in WER. With oracle segmentations, WER reductions of up to 38.5% are possible. With automatically derived segmentations, this approach has achieved a reduction of WER of 3.9% for the Broadcast News corpus, 6.2% for the spontaneous register of the MULT_REG corpus, and 7.7% for a spontaneous corpus of connected Spanish digits collected by Telefonica Investigacion y Desarrollo.

Full Paper

Bibliographic reference.  Nedel, Jon P. / Stern, Richard M. (2003): "Duration normalization and hypothesis combination for improved spontaneous speech recognition", In EUROSPEECH-2003, 1509-1512.