Interspeech'2005 - Eurospeech
In this paper, as a first step towards constructing a selective sound segregation model for use in modeling phenomena such as the cocktail party effect, we consider a basic problem of selective sound segregation for instrument sounds using a single-channel method (monaural processing). We propose a model concept for selective sound segregation based on auditory scene analysis and then describe implementation of a model for segregating target instrument sound from the mixed sound of various instruments. The proposed model consists of two blocks: a model of segregating two acoustic sources as bottom-up processing, and selective processing based on knowledge sources as top-down processing. Two simulations were done to evaluate the proposed model combining bottom-up and top-down processing. Results showed that the model could selectively segregate the target instrument sound from the mixed sound by using prior information, and that using both the bottom-up and top-down processing was more effective than using either separately. Since these simulations can be interpreted as representing concurrent vowel segregation in the case of a speech signal, it should be possible to extend the proposed model to a selective speech segregation model.
Bibliographic reference. Unoki, Masashi / Kubo, Masaaki / Haniu, Atsushi / Akagi, Masato (2005): "A model for selective segregation of a target instrument sound from the mixed sound of various instruments", In INTERSPEECH-2005, 2097-2100.