11th Annual Conference of the International Speech Communication Association

Makuhari, Chiba, Japan
September 26-30. 2010

Effects of Modelling Within- and Between-Frame Temporal Variations in Power Spectra on Non-Verbal Sound Recognition

Nobuhide Yamakawa (1), Tetsuro Kitahara (2), Toru Takahashi (1), Kazunori Komatani (1), Tetsuya Ogata (1), Hiroshi G. Okuno (1)

(1) Kyoto University, Japan
(2) Nihon University, Japan

Research on environmental sound recognition has not shown great development in comparison with that on speech and musical signals. One of the reasons is that the category of environmental sounds covers a broad range of acoustical natures. We classified them in order to explore suitable recognition techniques for each characteristic. We focus on impulsive sounds and their non-stationary feature within and between analytic frames. We used matching-pursuit as a framework to use wavelet analysis for extracting temporal variation of audio features inside a frame. We also investigated the validity of modeling decaying patterns of sounds using Hidden Markov Models. Experimental results indicate that sounds with multiple impulsive signals are recognized better by using time-frequency analyzing bases than by frequency domain analysis. Classification of sound classes with a long and clear decaying pattern improves when multiple number of HMMs are applied.

Full Paper

Bibliographic reference.  Yamakawa, Nobuhide / Kitahara, Tetsuro / Takahashi, Toru / Komatani, Kazunori / Ogata, Tetsuya / Okuno, Hiroshi G. (2010): "Effects of modelling within- and between-frame temporal variations in power spectra on non-verbal sound recognition", In INTERSPEECH-2010, 2342-2345.