ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder

Joon Byun, Seungmin Shin, Youngcheol Park, Jongmo Sung, Seungkwon Beack

This paper presents a loss function to compensate for the perceptual loss of the deep neural network (DNN)-based speech coder. By utilizing the psychoacoustic model (PAM), we design a loss function to maximize the mask-to-noise ratio (MNR) in multi-resolution Mel-frequency scales. Also, a perceptual entropy (PE)-based weighting scheme is incorporated onto the MNR loss so that the DNN model focuses more on perceptually important Mel-frequency bands. The proposed loss function was tested on a CNN-based autoencoder implementing the softmax quantization and entropy-based bitrate control. Objective and subjective tests conducted with speech signals showed that the proposed loss function produced higher perceptual quality than the previous perceptual loss functions.


doi: 10.21437/Interspeech.2021-2151

Cite as: Byun, J., Shin, S., Park, Y., Sung, J., Beack, S. (2021) Development of a Psychoacoustic Loss Function for the Deep Neural Network (DNN)-Based Speech Coder. Proc. Interspeech 2021, 1694-1698, doi: 10.21437/Interspeech.2021-2151

@inproceedings{byun21_interspeech,
  author={Joon Byun and Seungmin Shin and Youngcheol Park and Jongmo Sung and Seungkwon Beack},
  title={{Development of a Psychoacoustic Loss Function for the Deep Neural Network	(DNN)-Based Speech Coder}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1694--1698},
  doi={10.21437/Interspeech.2021-2151}
}