ISCA Archive IberSPEECH 2022
ISCA Archive IberSPEECH 2022

The role of window length and shift in complex-domain DNN-based speech enhancement

Celia García-Ruiz, Angel M. Gomez, Juan M. Martín-Doñas

Deep learning techniques have widely been applied to speech enhancement as they show outstanding modeling capabilities that are needed for proper speech-noise separation. In contrast to other end-to-end approaches, masking-based methods consider speech spectra as input to the deep neural network, providing spectral masks for noise removal or attenuation. In these approaches, the Short-Time Fourier Transform (STFT) and, particularly, the parameters used for the analysis/synthesis window, plays an important role which is often neglected. In this paper, we analyze the effects of window length and shift on a complex-domain convolutional-recurrent neural network (DCCRN) which is able to provide, separately, magnitude and phase corrections. Different perceptual quality and intelligibility objective metrics are used to assess its performance. As a result, we have observed that phase corrections have an increased impact with shorter window sizes. Similarly, as window overlap increases, phase takes more relevance than magnitude spectrum in speech enhancement.


doi: 10.21437/IberSPEECH.2022-30

Cite as: García-Ruiz, C., Gomez, A.M., Martín-Doñas, J.M. (2022) The role of window length and shift in complex-domain DNN-based speech enhancement . Proc. IberSPEECH 2022, 146-150, doi: 10.21437/IberSPEECH.2022-30

@inproceedings{garciaruiz22_iberspeech,
  author={Celia García-Ruiz and Angel M. Gomez and Juan M. Martín-Doñas},
  title={{The role of window length and shift in complex-domain DNN-based speech enhancement }},
  year=2022,
  booktitle={Proc. IberSPEECH 2022},
  pages={146--150},
  doi={10.21437/IberSPEECH.2022-30}
}