INTERSPEECH 2011

The goal of speech enhancement algorithms is to provide an estimate of clean speech starting from noisy observations. In general, the estimate is obtained by minimizing a chosen distortion metric. The oftenemployed cost is the meansquare error (MSE), which results in a Wienerfilter solution. Since the ground truth is not available in practice, the practical utility of the optimal estimators is limited. Alternative, one can optimize an unbiased estimate of the MSE. This is the key idea behind Stein's unbiased risk estimation (SURE) principle. Within this framework, we derive SURE solutions for the MSE and ItakuraSaito (IS) distortion measures. We also propose parametric versions of the corresponding SURE estimators, which give additional flexibility in controlling the attenuation characteristics for maximum signaltonoiseratio (SNR) gain. We compare the performance of the two distortion measures in terms of attenuation profiles, average segmental SNR, global SNR, and spectrograms. We also include a comparison with the standard power spectral subtraction technique. The results show that the IS distortion consistently gives better performance gain in all respects. The perceived quality of the enhanced speech is also better in case of the IS metric.
Bibliographic reference. Muraka, Nagarjuna Reddy / Seelamantula, Chandra Sekhar (2011): "A riskestimationbased comparison of mean square error and itakurasaito distortion measures for speech enhancement", In INTERSPEECH2011, 349352.