ISCA Archive Interspeech 2013
ISCA Archive Interspeech 2013

Preservation of speech spectral dynamics enhances intelligibility

Petko N. Petkov, W. Bastiaan Kleijn

We propose a method for the enhancement of intelligibility in scenarios where speech is rendered in a noisy environment. The method is based on the hypothesis that intelligibility is a monotonic function of the degree of preservation of the speech spectral dynamics. The accuracy of the speech spectral dynamics can then be traded against the power of the rendered speech signal. We can either maximize the dynamics accuracy given the signal power, or minimize the signal power given the dynamics accuracy. In our implementation, the spectral dynamics is quantified as the difference of the mel cepstra between time frames of the speech signal. We compared the speech rendered by our implementation against both natural speech and a reference method, for the scenario where signal power is minimized given a target dynamics accuracy, and observed a significantly improved intelligibility. The low system delay, and the low complexity and memory requirements make the new method particularly suitable for real-time applications.


doi: 10.21437/Interspeech.2013-773

Cite as: Petkov, P.N., Kleijn, W.B. (2013) Preservation of speech spectral dynamics enhances intelligibility. Proc. Interspeech 2013, 3597-3601, doi: 10.21437/Interspeech.2013-773

@inproceedings{petkov13_interspeech,
  author={Petko N. Petkov and W. Bastiaan Kleijn},
  title={{Preservation of speech spectral dynamics enhances intelligibility}},
  year=2013,
  booktitle={Proc. Interspeech 2013},
  pages={3597--3601},
  doi={10.21437/Interspeech.2013-773}
}