8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


A Trainable Speech Enhancement Technique Based on Mixture Models for Speech and Noise

Ilyas Potamitis, Nikos Fakotakis, George Kokkinakis

University of Patras, Greece

Our work introduces a trainable speech enhancement technique that can directly incorporate information about the long-term, time-frequency characteristics of speech signals prior to the enhancement process. We approximate noise spectral magnitude from available recordings from the operational environment as well as clean speech from a clean database with mixtures of Gaussian pdfs using the Expectation-Maximization algorithm (EM). Subsequently, we apply the Bayesian inference framework to the degraded spectral coefficients and by employing Minimum Mean Square Error Estimation (MMSE) we derive a closed form solution for the spectral magnitude estimation task. We evaluate our technique with a focus on real, highly non-stationary noise types (e.g. passing-by aircraft noise) and demonstrate its efficiency at low SNRs.

Full Paper

Bibliographic reference.  Potamitis, Ilyas / Fakotakis, Nikos / Kokkinakis, George (2003): "A trainable speech enhancement technique based on mixture models for speech and noise", In EUROSPEECH-2003, 573-576.