8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Environmental Sound Source Identification Based on Hidden Markov Model for Robust Speech Recognition

Takanobu Nishiura (1), Satoshi Nakamura (2), Kazuhiro Miki (3), Kiyohiro Shikano (3)

(1) Wakayama University, Japan
(2) ATR-SLT, Japan
(3) Nara Institute of Science and Technology, Japan

In real acoustic environments, humans communicate with each other through speech by focusing on the target speech among environmental sounds. We can easily identify the target sound from other environmental sounds. For hands-free speech recognition, the identification of the target speech from environmental sounds is imperative. This mechanism may also be important for a self-moving robot to sense the acoustic environments and communicate with humans. Therefore, this paper first proposes Hidden Markov Model (HMM)-based environmental sound source identification. Environmental sounds are modeled by three states of HMMs and evaluated using 92 kinds of environmental sounds. The identification accuracy was 95.4%. This paper also proposes a new HMM composition method that composes speech HMMs and an HMM of categorized environmental sounds for robust environmental sound-added speech recognition. As a result of the evaluation experiments, we confirmed that the proposed HMM composition outperforms the conventional HMM composition with speech HMMs and a noise (environmental sound) HMM trained using noise periods prior to the target speech in a captured signal.

Full Paper

Bibliographic reference.  Nishiura, Takanobu / Nakamura, Satoshi / Miki, Kazuhiro / Shikano, Kiyohiro (2003): "Environmental sound source identification based on hidden Markov model for robust speech recognition", In EUROSPEECH-2003, 2157-2160.