ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2008)
Human listeners are able to understand speech in the presence of a noisy background. How to simulate this perceptual ability remains a great challenge. This paper describes a preliminary evaluation of intelligibility of the output of a monaural speech segregation system. The system performs speech segregation in two stages. The first stage segregates voiced speech using supervised learning of harmonic features, and the second stage segregates unvoiced speech by subtracting noise energy that is estimated from voiced intervals and onset/offset based segmentation. Objective evaluation in terms of the match to ideal binary time-frequency masks shows substantial improvements. Tests with human subjects indicate that the system improves intelligibility for young listeners when the input SNR is very low, but does not aid elderly listeners. This preliminary evaluation identifies aspects of the system that should be improved in order to produce consistent improvement in intelligibility in noisy environments.
Bibliographic reference. Hu, Ke / Divenyi, Pierre / Ellis, Daniel P. W. / Jin, Zhaozhang / Shinn-Cunningham, Barbara G. / Wang, DeLiang (2008): "Preliminary intelligibility tests of a monaural speech segregation system", In SAPA-2008, 11-16.