Sixth European Conference on Speech Communication and Technology
In a continuous speech recognition system, a longer waveform is usually segmented into some shorter pieces based on simple acoustic criteria, such as unfilled pauses (i.e., silences). We call such a kind of segmentation as an acoustic segmentation. In general, the acoustic segmentations do not reflect the linguistic structure. They may fragment sentences or semantic units. Besides, they may also group together some unrelated units. Therefore, we need to resegment acoustic segmentations in order to output linguistically meaningful units such as clauses. We call such a kind of segmentation as a linguistic segmentation. This paper employs several acoustic and prosodic clues to resegment acoustic segmentations for identifying linguistic segmentations. Based on these clues, the experimental results show that a precision rate of 94.46% and a recall rate of 87.38% can be achieved.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Lee, Yue-Shi / Chen, Hsin-Hsi (1999): "Identifying linguistic segmentations in Chinese spoken dialogue", In EUROSPEECH'99, 2003-2006.