The problem of sequential organization in the cochannel speech situation has previously been studied using speaker-model based methods. A major limitation of these methods is that they require the availability of pretrained speaker models and prior knowledge (or detection) of participating speakers. We propose an unsupervised clustering approach to cochannel speech sequential organization. Given enhanced cepstral features, we search for the optimal assignment of simultaneous speech streams by maximizing the between- and within-cluster scatter matrix ratio penalized by concurrent pitches within individual speakers. A genetic algorithm is employed to speed up the search. Our method does not require trained speaker models, and experiments with both ideal and estimated simultaneous streams show the proposed method outperforms a speaker-model based method in both speech segregation and computational efficiency.
Bibliographic reference. Hu, Ke / Wang, DeLiang (2010): "Unsupervised sequential organization for cochannel speech separation", In INTERSPEECH-2010, 2790-2793.