EUROSPEECH 2003 - INTERSPEECH 2003
Our work introduces a trainable speech enhancement technique that can directly incorporate information about the long-term, time-frequency characteristics of speech signals prior to the enhancement process. We approximate noise spectral magnitude from available recordings from the operational environment as well as clean speech from a clean database with mixtures of Gaussian pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE) we derive a closed form solution for the spectral magnitude estimation task. We evaluate our technique with a focus on real, highly non-stationary noise types (e.g. passing-by aircraft noise) and demonstrate its efficiency at low SNRs.
Bibliographic reference. Potamitis, Ilyas / Fakotakis, Nikos / Kokkinakis, George (2003): "A trainable speech enhancement technique based on mixture models for speech and noise", In EUROSPEECH-2003, 573-576.