Interspeech'2005 - Eurospeech

Lisbon, Portugal
September 4-8, 2005

Soft Harmonic Masks for Recognising Speech in the Presence of a Competing Speaker

André Coy, Jon Barker

University of Sheffield, UK

The paper addresses the problem of recognising speech in the presence of a competing speaker. It uses a two stage ‘Speech Fragment Decoding' system. The system works by first segmenting a spectro-temporal representation of the mixture into a number of fragments, such that each fragment is dominated by a single source. An ASR search is then extended to find the combination of speech model sequence and fragment subset that best fits a set of clean speech models. This paper extends previous work by combining ‘Speech Fragment Decoding' with soft missing data techniques to better handle spectro-temporal regions that cannot be confidently ascribed to either foreground or background. Recognition experiments are performed on a connected digit task using 0 db mixtures of simultaneous mixed-gender speakers. The incorporation of soft decisions leads to an increase in system performance from 66.9% to 72.2%.

