14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Increasing Speech Intelligibility via Spectral Shaping with Frequency Warping and Dynamic Range Compression Plus Transient Enhancement

Elizabeth Godoy, Yannis Stylianou

FORTH, Greece

In order to make speech (natural or synthetic) more intelligible for listeners in real-world noisy environments, various modifications have been proposed that exploit spectral and temporal signal features. Previously, an evaluation campaign involving several approaches illustrated that a Spectral Shaping (SS) and Dynamic Range Compression (DRC) method proved highly successful at increasing speech intelligibility. For the public follow-up campaign (i.e., the Hurricane Challenge), this work introduces additional modifications into SSDRC in an attempt to further enhance intelligibility. First aiming to slow down the articulation rate, the speech is uniformly time stretched to effectively increase signal redundancy. Second, a frequency warping mechanism to expand vowel space is incorporated into the SS. Third, scaling to enhance the transient regions of speech is applied in the time-domain along with DRC. Objective and extensive subjective (i.e., the Hurricane Challenge) evaluations show that the new approach successfully achieves intelligibility gains over natural speech for all of the noise conditions evaluated, though compared to SSDRC, there is less advantage observed at higher SNR.

Full Paper

Bibliographic reference.  Godoy, Elizabeth / Stylianou, Yannis (2013): "Increasing speech intelligibility via spectral shaping with frequency warping and dynamic range compression plus transient enhancement", In INTERSPEECH-2013, 3572-3576.