INTERSPEECH 2008
9th Annual Conference of the International Speech Communication Association

Brisbane, Australia
September 22-26, 2008

An Entropy Based Feature for Whisper-Island Detection Within Audio Streams

Chi Zhang, John H. L. Hansen

University of Texas at Dallas, USA

Non-neutral speech, especially whispered speech, has strong negative impact on speech system performance. It is therefore necessary to detect whisper-islands embedded within neutral speech prior to subsequent processing steps. Detecting whisper-islands in speech audio streams can contribute to improved modeling, speech analysis, and understanding. Speech technology can also benefit by allowing for suppression/obscuring of sensitive data (names, credit card numbers, etc.) in audio archives, call centers, or for spoken document retrieval systems. This study focuses on detecting whisper-island from neutral speech within audio streams using a proposed new entropy-based feature. The new feature focused on effectively detecting vocal effort change points between whisper and neutral speech. Experimental results employing a multi-error score show that the new feature has superior performance over a previous method introduced in [2]. Overall, the detection accuracy of 97% (for male) and 96.7% (for female) indicate effective performance in whisper-island detection, and suggests a viable algorithm to assist speech and language technology when whisper is present.

Full Paper

Bibliographic reference.  Zhang, Chi / Hansen, John H. L. (2008): "An entropy based feature for whisper-island detection within audio streams", In INTERSPEECH-2008, 2510-2513.