The goal of speech enhancement algorithms is to provide an estimate of clean speech starting from noisy observations. In general, the estimate is obtained by minimizing a chosen distortion metric. The often-employed cost is the mean-square error (MSE), which results in a Wiener-filter solution. Since the ground truth is not available in practice, the practical utility of the optimal estimators is limited. Alternative, one can optimize an unbiased estimate of the MSE. This is the key idea behind Stein's unbiased risk estimation (SURE) principle. Within this framework, we derive SURE solutions for the MSE and Itakura-Saito (IS) distortion measures. We also propose parametric versions of the corresponding SURE estimators, which give additional flexibility in controlling the attenuation characteristics for maximum signal-to-noise-ratio (SNR) gain. We compare the performance of the two distortion measures in terms of attenuation profiles, average segmental SNR, global SNR, and spectrograms. We also include a comparison with the standard power spectral subtraction technique. The results show that the IS distortion consistently gives better performance gain in all respects. The perceived quality of the enhanced speech is also better in case of the IS metric.
Bibliographic reference. Muraka, Nagarjuna Reddy / Seelamantula, Chandra Sekhar (2011): "A risk-estimation-based comparison of mean square error and itakura-saito distortion measures for speech enhancement", In INTERSPEECH-2011, 349-352.