EUROSPEECH 2003 - INTERSPEECH 2003
In real acoustic environments, humans communicate with each other through speech by focusing on the target speech among environmental sounds. We can easily identify the target sound from other environmental sounds. For hands-free speech recognition, the identification of the target speech from environmental sounds is imperative. This mechanism may also be important for a self-moving robot to sense the acoustic environments and communicate with humans. Therefore, this paper first proposes Hidden Markov Model (HMM)-based environmental sound source identification. Environmental sounds are modeled by three states of HMMs and evaluated using 92 kinds of environmental sounds. The identification accuracy was 95.4%. This paper also proposes a new HMM composition method that composes speech HMMs and an HMM of categorized environmental sounds for robust environmental sound-added speech recognition. As a result of the evaluation experiments, we confirmed that the proposed HMM composition outperforms the conventional HMM composition with speech HMMs and a noise (environmental sound) HMM trained using noise periods prior to the target speech in a captured signal.
Bibliographic reference. Nishiura, Takanobu / Nakamura, Satoshi / Miki, Kazuhiro / Shikano, Kiyohiro (2003): "Environmental sound source identification based on hidden Markov model for robust speech recognition", In EUROSPEECH-2003, 2157-2160.