ISCA Archive L3DAS 2022
ISCA Archive L3DAS 2022

L3DAS22: Exploring Loss Functions for 3D Speech Enhancement

Tyler Vuong, Mark Lindsey, Yangyang Xia, Richard Stern

This work explores the effects of different speech enhancement loss functions traditionally used for monophonic signals when applied to the L3DAS22 Challenge 3D Speech Enhancement Task. In addition to baseline time-domain losses, loss functions in the time-frequency and modulation domains are introduced to a common network. These losses are compared by their effect on system performance for the task, and then by their correlation with important speech enhancement metrics, such as word error rate (WER) and short-time objective intelligibility (STOI). Findings show that the Phase-Constrained Magnitude (PCM) loss paired with modulation loss improved performance by 12.1% relative to the L3DAS22 baseline in terms of the challenge's evaluation metric. It was also found that the the modulation distance is consistently more correlated with WER and STOI than other metrics.


doi: 10.21437/L3DAS.2022-1

Cite as: Vuong, T., Lindsey, M., Xia, Y., Stern, R. (2022) L3DAS22: Exploring Loss Functions for 3D Speech Enhancement. Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing, 1-5, doi: 10.21437/L3DAS.2022-1

@inproceedings{vuong22_l3das,
  author={Tyler Vuong and Mark Lindsey and Yangyang Xia and Richard Stern},
  title={{L3DAS22: Exploring Loss Functions for 3D Speech Enhancement}},
  year=2022,
  booktitle={Proc. L3DAS22: Machine Learning for 3D Audio Signal Processing},
  pages={1--5},
  doi={10.21437/L3DAS.2022-1}
}