In the modulation-filtering based speech enhancement method, noise suppression is achieved by bandpass filtering the temporal trajectories of the power spectrum. In the literature, some authors use the power spectrum directly for modulation filtering, while others use different compression functions for reducing the dynamic range of the power spectrum prior to its modulation filtering. This paper compares systematically different dynamic range compression functions applied to the power spectrum for speech enhancement. Subjective listening tests and objective measures are used to evaluate the quality as well as the intelligibility of the enhanced speech. The quality is measured objectively in terms of the Perceptual Estimation of Speech Quality (PESQ) measure and the intelligibility in terms of the Speech Transmission Index (STI) measure. It is found that P0.3333 (power spectrum raised to power 1/3) results in the highest speech quality and intelligibility.
Bibliographic reference. Lyons, James G. / Paliwal, Kuldip K. (2008): "Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement", In INTERSPEECH-2008, 387-390.