13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Low-SNR, Speaker-Dependent Speech Enhancement using GMMs and MFCCs

Laura Boucheron, Phillip L. De Leon

New Mexico State University, Klipsch School of Elect. and Comp. Eng., Las Cruces, NM, USA

In this paper, we propose a two-stage speech enhancement technique. In the training stage, a Gaussian Mixture Model (GMM) of the melfrequency cepstral coefficients (MFCCs) of a user's clean speech is computed wherein the component densities of the GMM serve to model the user's "acoustic classes." In the enhancement stage, MFCCs from a noisy speech signal are computed and the underlying clean acoustic class is identified via a maximum a posteriori (MAP) decision and a novel mapping matrix. The associated GMM parameters are then used to estimate the MFCCs of the clean speech from the MFCCs of the noisy speech. Finally, the estimated MFCCs are transformed back to a time-domain waveform. Our results show that we can improve PESQ in environments as low as -10 dB SNR.

Index Terms: Speech enhancement, MFCC, GMM

Full Paper

Bibliographic reference.  Boucheron, Laura / Leon, Phillip L. De (2012): "Low-SNR, speaker-dependent speech enhancement using GMMs and MFCCs", In INTERSPEECH-2012, 575-578.