Interspeech'2005 - Eurospeech
A conversational system which can generate back-channel feedback of proper content in proper timing by utilizing FST based early detectable decoder and prosody analysis is proposed. In human conversation, we do not take turns in order, but we give the back-channel feedbacks during the partner's speech. By receiving these feedbacks, speakers can know the partner's state and feel comfortable to speak. Therefore, spoken dialogue systems should be able to generate back-channel feedbacks in synchronization with user's utterances. The appropriateness of these feedbacks depends on the contents and the timings. The contents strongly depend on the contents of the dialogue partner's utterance, and the timings strongly depend on the prosody of the partner's utterance. In order to determine the content of the feedback earlier than the end of the utterance, we use finite state transducer based speech recognizer. We used prosody information, especially F0 and power of the utterance, to extract the proper timing of the feedback. We implemented these modules and applied them to the spoken dialogue system on the humanoid robot ROBISUKE. Experimental results show the effectiveness of our methods.
Bibliographic reference. Fujie, Shinya / Fukushima, Kenta / Kobayashi, Tetsunori (2005): "Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system", In INTERSPEECH-2005, 889-892.