Recognition of interest of a speaker within a human dialog bears great potential in many commercial applications. Within this work we therefore introduce an approach that analyses acoustic and linguistic cues of a spoken utterance. A systematic generation of more than 5k hi-level features basing on prosodic and spectral feature contours by means of descriptive statistical analysis and subsequent feature space optimization is used to find relevant acoustic attributes. For linguistic information integration a bag-of-words representation is used relying on a speech recognizers output. One main aspect is the database of more than 2k spontaneous sub-speaker turns recorded and annotated for this analysis. Several influence factors as microphone distance and ASR versus annotation of spoken content are discussed. Overall remarkable performance of a running prototype can be reported discriminating between three levels of interest.
Cite as: Schuller, B., Köhler, N., Müller, R., Rigoll, G. (2006) Recognition of interest in human conversational speech. Proc. Interspeech 2006, paper 1621-Tue1A3O.1, doi: 10.21437/Interspeech.2006-273
@inproceedings{schuller06_interspeech, author={Björn Schuller and Niels Köhler and Ronald Müller and Gerhard Rigoll}, title={{Recognition of interest in human conversational speech}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1621-Tue1A3O.1}, doi={10.21437/Interspeech.2006-273} }