Sixth International Conference on Spoken Language Processing
Most automatic speech recognition work has concentrated on read speech, whose acoustic aspects differ significantly from speech found in actual dialogues. A primary difference between read speech and spontaneous speech concerns a high rate of disfluencies (e.g., filled pauses, repetitions, repairs, false starts). Filled pauses (e.g., "uh," "um"), unlike silences, resemble phones as part of words in continuous speech. In this paper the problem of detection of filled pauses in spontaneous speech and how this can be useful in automatic speech recognition are considered. The acoustic aspects of filled pauses in a widely-used SWITCHBOARD  database are examined here, from the point of view of identifying them acoustically using a combination of duration, fundamental frequency and spectra.
Bibliographic reference. Gabrea, Marcel / O’Shaughnessy, Douglas (2000): "Detection of filled pauses in spontaneous conversational speech", In ICSLP-2000, vol.3, 678-681.