In this paper it is shown experimentally that a new blind signal separation method in the frequency domain improves significantly the speaker signal to interference ratio (SIR) and the phoneme recognition score of a continuous speech, speaker-independent acoustic decoder in a two-simultaneous-speaker environment. The implemented two-sensor separation method is based on evolutionary minimization of the cross-correlation of the separated speech signals. Extensive experiments have been conducted in three types of artificially created mixture scenarios: instantaneous, time delayed and convolutive, using real room impulse responses. The experiments showed that in the worst case (convolutive mixture scenario) a mean improvement of 11dB SIR is achieved by the proposed GaBSS method for both output channels. Furthermore, the phoneme recognition rate of the separated signals was found to approach the rate measured with the clean signals in all experiments. The recognition rate improvement is maximised in the case of convoluted mixing of equal energy speech signals.
Cite as: Koutras, A., Dermatas, E., Kokkinakis, G. (1999) Recognizing simultaneous speech: a genetic algorithm approach. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 2551-2554, doi: 10.21437/Eurospeech.1999-560
@inproceedings{koutras99_eurospeech, author={Athanasios Koutras and Evangelos Dermatas and George Kokkinakis}, title={{Recognizing simultaneous speech: a genetic algorithm approach}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={2551--2554}, doi={10.21437/Eurospeech.1999-560} }