8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Speech Spotter: On-demand speech Recognition in Human-Human Conversation on the Telephone or in Face-to-Face Situations

Masataka Goto (1), Koji Kitayama (2), Katsunobu Itou (3), Tetsunori Kobayashi (2)

(1) National Institute of Advanced Industrial Science and Technology (AIST), Japan
(2) Waseda University, Japan
(3) Nagoya University, Japan

This paper describes a novel speech-interface function, called "speech spotter", which enables a user to enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to another person or a speech recognizer. We solve this problem by using two kinds of nonverbal speech information: a filled pause (a vowel-lengthening hesitation like "er...") and voice pitch. Only when a user utters a voice command with a high pitch just after a filled pause is the voice command accepted by the speech recognizer. By using this speech-spotter function, we have built two application systems: an on-demand information system for assisting human-human conversation and a music-playback system for enriching telephone conversation. The results from using these systems have shown that the speech-spotter function is robust and convenient enough to be used in face-to-face or cellular-phone conversations.

Full Paper

Bibliographic reference.  Goto, Masataka / Kitayama, Koji / Itou, Katsunobu / Kobayashi, Tetsunori (2004): "Speech spotter: on-demand speech recognition in human-human conversation on the telephone or in face-to-face situations", In INTERSPEECH-2004, 1533-1536.