In this paper, as a first step towards constructing a selective sound segregation model for use in modeling phenomena such as the cocktail party effect, we consider a basic problem of selective sound segregation for instrument sounds using a single-channel method (monaural processing). We propose a model concept for selective sound segregation based on auditory scene analysis and then describe implementation of a model for segregating target instrument sound from the mixed sound of various instruments. The proposed model consists of two blocks: a model of segregating two acoustic sources as bottom-up processing, and selective processing based on knowledge sources as top-down processing. Two simulations were done to evaluate the proposed model combining bottom-up and top-down processing. Results showed that the model could selectively segregate the target instrument sound from the mixed sound by using prior information, and that using both the bottom-up and top-down processing was more effective than using either separately. Since these simulations can be interpreted as representing concurrent vowel segregation in the case of a speech signal, it should be possible to extend the proposed model to a selective speech segregation model.
Cite as: Unoki, M., Kubo, M., Haniu, A., Akagi, M. (2005) A model for selective segregation of a target instrument sound from the mixed sound of various instruments. Proc. Interspeech 2005, 2097-2100, doi: 10.21437/Interspeech.2005-684
@inproceedings{unoki05_interspeech, author={Masashi Unoki and Masaaki Kubo and Atsushi Haniu and Masato Akagi}, title={{A model for selective segregation of a target instrument sound from the mixed sound of various instruments}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={2097--2100}, doi={10.21437/Interspeech.2005-684} }