Ninth International Conference on Spoken Language Processing

Pittsburgh, PA, USA
September 17-21, 2006

Combining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation

Ji Ming (1), Timothy J. Hazen (2), James R. Glass (2)

(1) Queen’s University Belfast, UK (2) Massachusetts Institute of Technology, USA

This paper considers the recognition of speech given in the form of two mixed sentences, spoken by the same talker or by two different talkers. The database published on the ICSLP’2006 website for Two-Talker Speech Separation Challenge is used in the study. A system that recognizes and reconstructs both sentences from the given mixture is described. The system involves a combination of several different techniques, including a missing-feature approach for improving crosstalk/noise robustness, Wiener filtering for speech restoration, HMM-based speech reconstruction, and speakerdependent/- independent modeling for speaker/speech recognition. For clean speech recognition, the system obtained a word accuracy rate 96.7%. For the two-talker speech separation challenge task, the system obtained 81.4% at 6 dB TMR (target-to-masker ratio) and 34.1% at -9 dB TMR.

Full Paper

Bibliographic reference.  Ming, Ji / Hazen, Timothy J. / Glass, James R. (2006): "Combining missing-feature theory, speech enhancement and speaker-dependent/-independent modeling for speech separation", In INTERSPEECH-2006, paper 1377-Mon1WeS.6.