ISCA Archive L3DAS 2022
ISCA Archive L3DAS 2022

A Perceptual Loss Based Complex Neural Beamforming for Ambix 3D Speech Enhancement

Heitor R. Guimaraes, Wesley Beccaro, Miguel A. Ramirez

This work proposes a novel approach to B-Format AmbiX 3D speech enhancement based on the short-time Fourier transform (STFT) representation. The model is a Fully Complex Convolutional Network (FC2N) that estimates a mask to be applied to the input features. Then, a final layer is responsible for converting the B-format to a monaural representation in which we apply the inverse STFT (ISTFT) operation. For the optimization process, we use a compounded loss function, applied in the time-domain, based on the short-time objective intelligibility (STOI) metric combined with a perceptual loss on top of the wav2vec 2.0 model. The approach is applied on Task 1 of the L3DAS22 challenge, where our model achieves a score of 0.845 in the metric proposed by the challenge, using a subset of the development set as reference.


doi: 10.21437/L3DAS.2022-4

Cite as: Guimaraes, H.R., Beccaro, W., Ramirez, M.A. (2022) A Perceptual Loss Based Complex Neural Beamforming for Ambix 3D Speech Enhancement. Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing, 16-20, doi: 10.21437/L3DAS.2022-4

@inproceedings{guimaraes22_l3das,
  author={Heitor R. Guimaraes and Wesley Beccaro and Miguel A. Ramirez},
  title={{A Perceptual Loss Based Complex Neural Beamforming for Ambix 3D Speech Enhancement}},
  year=2022,
  booktitle={Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing},
  pages={16--20},
  doi={10.21437/L3DAS.2022-4}
}