The purpose of this paper is to study and analyze both the -\it non-lexical- filled pauses and intended responses in conversational spontaneous speech, and how this can be useful in both automatic speech recognition and speaker identification systems. Through experiments, it was found that we are able to distinguish between words and non-lexical words in spontaneous speech using prosodic features. Consequently, a pre-recognition of such pauses using a decision-tree based CART classifier is evident. Thus, for ASR of spontaneous speech, such pauses can be either totally omitted or considered as words to be added to the dictionary of the ASR system, consequently improving the performance of such an ASR system.
Cite as: Tolba, H., O'Shaughnessy, D. (1999) Towards recognizing "non-lexical" words in spontaneous conversational speech. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 723-726, doi: 10.21437/Eurospeech.1999-175
@inproceedings{tolba99_eurospeech, author={Hesham Tolba and Douglas O'Shaughnessy}, title={{Towards recognizing "non-lexical" words in spontaneous conversational speech}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={723--726}, doi={10.21437/Eurospeech.1999-175} }