Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement

Fu-Kai Chuang, Syu-Siang Wang, Jeih-weih Hung, Yu Tsao, Shih-Hau Fang


Previous studies indicate that noise and speaker variations can degrade the performance of deep-learning-based speech-enhancement systems. To increase the system performance over environmental variations, we propose a novel speaker-aware system that integrates a deep denoising autoencoder (DDAE) with an embedded speaker identity. The overall system first extracts embedded speaker identity features using a neural network model; then the DDAE takes the augmented features as input to generate enhanced spectra. With the additional embedded features, the speech-enhancement system can be guided to generate the optimal output corresponding to the speaker identity. We tested the proposed speech-enhancement system on the TIMIT dataset. Experimental results showed that the proposed speech-enhancement system could improve the sound quality and intelligibility of speech signals from additive noise-corrupted utterances. In addition, the results suggested system robustness for unseen speakers when combined with speaker features.


 DOI: 10.21437/Interspeech.2019-2108

Cite as: Chuang, F., Wang, S., Hung, J., Tsao, Y., Fang, S. (2019) Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement. Proc. Interspeech 2019, 3173-3177, DOI: 10.21437/Interspeech.2019-2108.


@inproceedings{Chuang2019,
  author={Fu-Kai Chuang and Syu-Siang Wang and Jeih-weih Hung and Yu Tsao and Shih-Hau Fang},
  title={{Speaker-Aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3173--3177},
  doi={10.21437/Interspeech.2019-2108},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2108}
}