Natural or synthetic speech is increasingly used in less-than-ideal listening conditions. Maximising the likelihood of correct message reception in such situations often leads to a strategy of loud and repetitive renditions of output speech. An alternative approach is to modify the speech signal in ways which increase intelligibility in noise without increasing signal level or duration. The current study focused on the design of stationary spectral modifications whose effect is to reallocate speech energy across frequency bands. Frequency band weights were selected using a genetic algorithm-based optimisation procedure, with glimpse proportion as the objective intelligibility metric, for a range of noise types and levels. As expected, a clear dependence of noise type and global signal-to-noise ratio on energy reallocation was found. One unanticipated outcome was the consistent discovery of sparse, highly-selective spectral energy weightings, particularly in high noise conditions. In a subjective test using stationary noise and competing speech maskers, listeners were able to identify significantly more words in sentences as a result of spectral weighting, with increases of up to 15 percentage points. These findings suggest that context- dependent speech output can be used to maintain intelligibility at lower sound output levels.
Index Terms: speech intelligibility, noise, optimisation, genetic algorithm, glimpse proportion
Bibliographic reference. Tang, Yan / Cooke, Martin (2012): "Optimised spectral weightings for noise-dependent speech intelligibility enhancement", In INTERSPEECH-2012, 955-958.