ISCA Archive Interspeech 2008

An entropy based feature for whisper-island detection within audio streams

Chi Zhang, John H. L. Hansen

Non-neutral speech, especially whispered speech, has strong negative impact on speech system performance. It is therefore necessary to detect whisper-islands embedded within neutral speech prior to subsequent processing steps. Detecting whisper-islands in speech audio streams can contribute to improved modeling, speech analysis, and understanding. Speech technology can also benefit by allowing for suppression/obscuring of sensitive data (names, credit card numbers, etc.) in audio archives, call centers, or for spoken document retrieval systems. This study focuses on detecting whisper-island from neutral speech within audio streams using a proposed new entropy-based feature. The new feature focused on effectively detecting vocal effort change points between whisper and neutral speech. Experimental results employing a multi-error score show that the new feature has superior performance over a previous method introduced in [2]. Overall, the detection accuracy of 97% (for male) and 96.7% (for female) indicate effective performance in whisper-island detection, and suggests a viable algorithm to assist speech and language technology when whisper is present.

