ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA2008)

Brisbane, Australia
September 21, 2008

Preliminary Intelligibility Tests of a Monaural Speech Segregation System

Ke Hu (1), Pierre Divenyi (2), Daniel P. W. Ellis (3), Zhaozhang Jin (1), Barbara G. Shinn-Cunningham (4), DeLiang Wang (1)

(1) Department of Computer Science & Engineering and Center for Cognitive Science, The Ohio State University, Columbus, OH, USA
(2) Speech and Hearing Research, VA Medical Center, Martinez, CA, USA
(3) Department of Electrical Engineering, Columbia University, New York, NY, USA
(4) Departments of Cognitive & Neural Systems and Biomedical Engineering, Boston University, Boston, MA, USA

Human listeners are able to understand speech in the presence of a noisy background. How to simulate this perceptual ability remains a great challenge. This paper describes a preliminary evaluation of intelligibility of the output of a monaural speech segregation system. The system performs speech segregation in two stages. The first stage segregates voiced speech using supervised learning of harmonic features, and the second stage segregates unvoiced speech by subtracting noise energy that is estimated from voiced intervals and onset/offset based segmentation. Objective evaluation in terms of the match to ideal binary time-frequency masks shows substantial improvements. Tests with human subjects indicate that the system improves intelligibility for young listeners when the input SNR is very low, but does not aid elderly listeners. This preliminary evaluation identifies aspects of the system that should be improved in order to produce consistent improvement in intelligibility in noisy environments.

