8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Speech Starter: Noise-Robust Endpoint Detection by Using Filled Pauses

Koji Kitayama (1), Masataka Goto (2), Katunobu Itou (2), Tetsunori Kobayashi (1)

(1) Waseda University, Japan
(2) AIST, Japan

In this paper we propose a speech interface function, called speech starter, that enables noise-robust endpoint (utterance) detection for speech recognition. When current speech recognizers are used in a noisy environment, a typical recognition error is caused by incorrect endpoints because their automatic detection is likely to be disturbed by non-stationary noises. The speech starter function enables a user to specify the beginning of each utterance by uttering a filler with a filled pause, which is used as a trigger to start speech-recognition processes. Since filled pauses can be detected robustly in a noisy environment, practical endpoint detection is achieved. Speech starter also offers the advantage of providing a hands-free speech interface and it is user-friendly because a speaker tends to utter filled pauses (e.g., "er...") at the beginning of utterances when hesitating in human-human communication. Experimental results from a 10-dB-SNR noisy environment show that the recognition error rate with speech starter was lower than with conventional endpoint-detection methods.

Full Paper

Bibliographic reference.  Kitayama, Koji / Goto, Masataka / Itou, Katunobu / Kobayashi, Tetsunori (2003): "Speech starter: noise-robust endpoint detection by using filled pauses", In EUROSPEECH-2003, 1237-1240.