In this study, a frame-based vocal effort likelihood space modeling framework for improved whisper-island detection within normally phonated audio streams is proposed. The proposed method is based on first training a traditional Gaussian mixture model for whisper and neutral speech, which is then employed to extract a newly proposed discriminative feature set entitled Vocal Effort Likelihood (VEL), for whisper-island detection. The VEL feature set is integrated within a BIC/T2-BIC segmentation scheme for vocal effort change point (VECP) detection. With the dimension-reduced VEL 2-D feature set, the proposed framework has reduced computational costs versus prior method . Experimental results using the UT-VocalEffort II corpus for whisper-island detection using the proposed framework are presented and compared with a previous algorithm introduced in . The proposed algorithm is shown to improve performance in VECP detection with the lowest Multi- Error Score (MES) of 6.33. Furthermore, very accurate whisper-island detection was obtained using proposed algorithm, which is useful for sustained performance in speech systems (ASR, Speaker-ID, etc.) which might experience whisper speech. Finally, experimental performance achieves a 100% detection rate for the proposed algorithm, which represents the best whisper-island detection performance with lowest computational costs available in the literature to date.
Bibliographic reference. Zhang, Chi / Hansen, John H. L. (2011): "Frame-level vocal effort likelihood space modeling for improved whisper-island detection", In INTERSPEECH-2011, 2421-2424.