ISCA Archive Interspeech 2005
ISCA Archive Interspeech 2005

Soft harmonic masks for recognising speech in the presence of a competing speaker

André Coy, Jon Barker

The paper addresses the problem of recognising speech in the presence of a competing speaker. It uses a two stage ‘Speech Fragment Decoding' system. The system works by first segmenting a spectro-temporal representation of the mixture into a number of fragments, such that each fragment is dominated by a single source. An ASR search is then extended to find the combination of speech model sequence and fragment subset that best fits a set of clean speech models. This paper extends previous work by combining ‘Speech Fragment Decoding' with soft missing data techniques to better handle spectro-temporal regions that cannot be confidently ascribed to either foreground or background. Recognition experiments are performed on a connected digit task using 0 db mixtures of simultaneous mixed-gender speakers. The incorporation of soft decisions leads to an increase in system performance from 66.9% to 72.2%.


doi: 10.21437/Interspeech.2005-249

Cite as: Coy, A., Barker, J. (2005) Soft harmonic masks for recognising speech in the presence of a competing speaker. Proc. Interspeech 2005, 2641-2644, doi: 10.21437/Interspeech.2005-249

@inproceedings{coy05_interspeech,
  author={André Coy and Jon Barker},
  title={{Soft harmonic masks for recognising speech in the presence of a competing speaker}},
  year=2005,
  booktitle={Proc. Interspeech 2005},
  pages={2641--2644},
  doi={10.21437/Interspeech.2005-249}
}