Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Detection of Filled Pauses in Spontaneous Conversational Speech

Marcel Gabrea, Douglas O’Shaughnessy

INRS-Télécommunications, Québec, Canada

Most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. A primary difference between read speech and spontaneous speech concerns a high rate of disfluencies (e.g., filled pauses, repetitions, repairs, false starts). Filled pauses (e.g., "uh," "um"), unlike silences, resemble phones as part of words in continuous speech. In this paper the problem of detection of filled pauses in spontaneous speech and how this can be useful in automatic speech recognition are considered. The acoustic aspects of filled pauses in a widely-used SWITCHBOARD [1] database are examined here, from the point of view of identifying them acoustically using a combination of duration, fundamental frequency and spectra.


  1. Godfrey J. J., Holliman E. C., and McDaniel J. "SWITCHBOARD Telephone Speech Corpus for Research and Development". IEEE International Conference on Acoustics. Speech, and Signal Processing. San Francisco, 1992, Vol. I, pages 517- 520.

Full Paper

Bibliographic reference.  Gabrea, Marcel / O’Shaughnessy, Douglas (2000): "Detection of filled pauses in spontaneous conversational speech", In ICSLP-2000, vol.3, 678-681.