Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization

Li Li, Hirokazu Kameoka, Tomoki Toda, Shoji Makino


Spectral domain speech enhancement algorithms based on non-negative spectrogram models such as non-negative matrix factorization (NMF) and non-negative matrix factor deconvolution are powerful in terms of signal recovery accuracy, however they do not directly lead to an enhancement in the feature domain (e.g., cepstral domain) or in terms of perceived quality. We have previously proposed a method that makes it possible to enhance speech in the spectral and cepstral domains simultaneously. Although this method was shown to be effective, the devised algorithm was computationally demanding. This paper proposes yet another formulation that allows for a fast implementation by replacing the regularization term with a divergence measure between the NMF model and the mel-generalized cepstral (MGC) representation of the target spectrum. Since the MGC is an auditory-motivated representation of an audio signal widely used in parametric speech synthesis, we also expect the proposed method to have an effect in enhancing the perceived quality. Experimental results revealed the effectiveness of the proposed method in terms of both the signal-to-distortion ratio and the cepstral distance.


 DOI: 10.21437/Interspeech.2017-1492

Cite as: Li, L., Kameoka, H., Toda, T., Makino, S. (2017) Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization. Proc. Interspeech 2017, 1998-2002, DOI: 10.21437/Interspeech.2017-1492.


@inproceedings{Li2017,
  author={Li Li and Hirokazu Kameoka and Tomoki Toda and Shoji Makino},
  title={Speech Enhancement Using Non-Negative Spectrogram Models with Mel-Generalized Cepstral Regularization},
  year=2017,
  booktitle={Proc. Interspeech 2017},
  pages={1998--2002},
  doi={10.21437/Interspeech.2017-1492},
  url={http://dx.doi.org/10.21437/Interspeech.2017-1492}
}