ISCA Archive Eurospeech 2001
ISCA Archive Eurospeech 2001

Discriminative disfluency modeling for spontaneous speech recognition

Chung-Hsien Wu, Gwo-Lang Yan

Most automatic speech recognizers (ASRs) have concentrated on read speech, which is different from speech with the presence of disfluencies. These ASRs cannot handle the speech with a high rate of disfluencies such as filled pauses, repetition, repairs, false starts, and silence pauses in actual spontaneous speech or dialogues. In this paper, we focus on the modeling of the filled pauses "uh" and "um". The filled pauses contain the characteristics of nasal and lengthening, and the acoustic parameters for these characteristics are analyzed and adopted for disfluency modeling. A Gaussian mixture model (GMM), trained by a discriminative training algorithm that minimizes the recognition error, is proposed. A transition probability density function is defined from the GMM and used to weight the transition probability between the boundaries of fluency and disfluency models in the one-stage algorithm. Experimental result shows that the proposed method yields an improvement rate of 27.3% for disfluency compared to the baseline system.


doi: 10.21437/Eurospeech.2001-461

Cite as: Wu, C.-H., Yan, G.-L. (2001) Discriminative disfluency modeling for spontaneous speech recognition. Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001), 1955-1958, doi: 10.21437/Eurospeech.2001-461

@inproceedings{wu01b_eurospeech,
  author={Chung-Hsien Wu and Gwo-Lang Yan},
  title={{Discriminative disfluency modeling for spontaneous speech recognition}},
  year=2001,
  booktitle={Proc. 7th European Conference on Speech Communication and Technology (Eurospeech 2001)},
  pages={1955--1958},
  doi={10.21437/Eurospeech.2001-461}
}