Sixth European Conference on Speech Communication and Technology
We propose a new cocktail-party recognition technique based on the coupling of a CASA-labelling method using the TDOA (Time Delay Of Arrival) with multistream recognition. This is an alternative to the classical "segregate and recognise" architecture. First, we have recorded a stereo database ST-NB95 from the mono Numbers95. This is composed of binary mixtures of sentences at 0dB, placed left and right. The probability to get the labels "left" and "right" is assigned to the subband time frames thanks to a mapping function. This depends on the relative level. It is established a priori, using a reference database composed of isolated words recorded in the same condition. We adapt the recognition paradigm to this particular situation. The model WER of binary mixtures is about 50%. This is a great improvement relatively to the WER (73%) of the fullband PLP. We conclude the model is able to recognise the dominant words of a binary mixture.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Glotin, Hervé / Berthommier, Frédéric / Tessier, Emmanuel (1999): "A CASA-labelling model using the localisation cue for robust cocktail-party speech recognition", In EUROSPEECH'99, 2351-2354.