Interspeech'2005 - Eurospeech
In this paper, we introduce supergaussian generalized autoregressive conditional heteroscedasticity (GARCH) models for speech signals in the short-time Fourier transform (STFT) domain. We address the problem of speech enhancement, and show that estimating the variances of the STFT expansion coefficients based on GARCH models yields higher speech quality than by using the decision-directed method, whether the fidelity criterion is minimum mean-squared error (MMSE) of the spectral coefficients or MMSE of the log-spectral amplitude (LSA). Furthermore, while a Gaussian model is inferior to Gamma and Laplacian models when estimating the variances by the decision-directed method, a Gaussian model is superior when using the GARCH modeling method. This facilitates MMSE-LSA estimation, while taking into consideration the heavy-tailed distribution.
Bibliographic reference. Cohen, Israel (2005): "Supergaussian GARCH models for speech signals", In INTERSPEECH-2005, 2053-2056.