Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech

Li Li, Hirokazu Kameoka, Takuya Higuchi, Hiroshi Saruwatari


While spectral domain speech enhancement algorithms using non-negative matrix factorization (NMF) are powerful in terms of signal recovery accuracy (e.g., signal-to-noise ratio), they do not necessarily lead to an improvement in the quality of the enhanced speech in the feature domain. This implies that naively using these algorithms as front-end processing for e.g., speech recognition and speech conversion does not always lead to satisfactory results. To address this problem, this paper proposes a novel method that aims to jointly enhance the spectral and cepstral sequences of noisy speech, by optimizing a combined objective function consisting of an NMF-based model-fitting criterion defined in the spectral domain and a Gaussian mixture model (GMM)-based probability distribution defined in the cepstral domain. We derive a novel majorizer for this objective function, which allows us to derive a convergence-guaranteed iterative algorithm based on a majorization-minimization scheme for the optimization. Experimental results revealed that the proposed method outperformed the conventional NMF approach in terms of both signal-to-distortion ratio and cepstral distance.


DOI: 10.21437/Interspeech.2016-1286

Cite as

Li, L., Kameoka, H., Higuchi, T., Saruwatari, H. (2016) Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech. Proc. Interspeech 2016, 3753-3757.

Bibtex
@inproceedings{Li+2016,
author={Li Li and Hirokazu Kameoka and Takuya Higuchi and Hiroshi Saruwatari},
title={Semi-Supervised Joint Enhancement of Spectral and Cepstral Sequences of Noisy Speech},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1286},
url={http://dx.doi.org/10.21437/Interspeech.2016-1286},
pages={3753--3757}
}