An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement

Kehuang Li, Bo Wu, Chin-Hui Lee


We propose an iterative phase recovery framework to improve spectral mapping with an application to improving the performance of state-of-the-art speech enhancement systems using magnitude-based spectral mapping with deep neural networks (DNNs). We further propose to use an estimated time-frequency mask to reduce sign uncertainty in the overlap-add waveform reconstruction algorithm. In a series of enhancement experiments using a DNN baseline system, by directly replacing the original phase of noisy speech with the estimated phase obtained with a classical phase recovery algorithm, the proposed iterative technique reduces the log-spectral distortion (LSD) by 0.41 dB from the DNN baseline, and increases the perceptual evaluation speech quality (PESQ) by 0.05 over the DNN baseline, averaging over a wide range of signal and noise conditions. The proposed phase mask mechanism further increases the segmental signal-to-noise ratio (SegSNR) by 0.44 dB at an expense of a slight degradation in LSD and PESQ comparing with the algorithm without using any phase mask.


DOI: 10.21437/Interspeech.2016-494

Cite as

Li, K., Wu, B., Lee, C. (2016) An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement. Proc. Interspeech 2016, 3773-3777.

Bibtex
@inproceedings{Li+2016,
author={Kehuang Li and Bo Wu and Chin-Hui Lee},
title={An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-494},
url={http://dx.doi.org/10.21437/Interspeech.2016-494},
pages={3773--3777}
}