Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

A Model for Selective Segregation of a Target Instrument Sound from the Mixed Sound of Various Instruments

Masashi Unoki, Masaaki Kubo, Atsushi Haniu, Masato Akagi

JAIST, Japan

In this paper, as a first step towards constructing a selective sound segregation model for use in modeling phenomena such as the cocktail party effect, we consider a basic problem of selective sound segregation for instrument sounds using a single-channel method (monaural processing). We propose a model concept for selective sound segregation based on auditory scene analysis and then describe implementation of a model for segregating target instrument sound from the mixed sound of various instruments. The proposed model consists of two blocks: a model of segregating two acoustic sources as bottom-up processing, and selective processing based on knowledge sources as top-down processing. Two simulations were done to evaluate the proposed model combining bottom-up and top-down processing. Results showed that the model could selectively segregate the target instrument sound from the mixed sound by using prior information, and that using both the bottom-up and top-down processing was more effective than using either separately. Since these simulations can be interpreted as representing concurrent vowel segregation in the case of a speech signal, it should be possible to extend the proposed model to a selective speech segregation model.

