Dithered Quantization for Frequency-Domain Speech and Audio Coding

Tom Bäckström, Johannes Fischer, Sneha Das


A common issue in coding speech and audio in the frequency domain, which appears with decreasing bitrate, is that quantization levels become increasingly sparse. With low accuracy, high-frequency components are typically quantized to zero, which leads to a muffled output signal and musical noise. Band-width extension and noise-filling methods attempt to treat the problem by inserting noise of similar energy as the original signal, at the cost of low signal to noise ratio. Dithering methods however provide an alternative approach, where both accuracy and energy are retained. We propose a hybrid coding approach where low-energy samples are quantized using dithering, instead of the conventional uniform quantizer. For dithering, we apply 1 bit quantization in a randomized sub-space. We further show that the output energy can be adjusted to the desired level using a scaling parameter. Objective measurements and listening tests demonstrate the advantages of the proposed methods.


 DOI: 10.21437/Interspeech.2018-46

Cite as: Bäckström, T., Fischer, J., Das, S. (2018) Dithered Quantization for Frequency-Domain Speech and Audio Coding. Proc. Interspeech 2018, 3533-3537, DOI: 10.21437/Interspeech.2018-46.


@inproceedings{Bäckström2018,
  author={Tom Bäckström and Johannes Fischer and Sneha Das},
  title={Dithered Quantization for Frequency-Domain Speech and Audio Coding},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={3533--3537},
  doi={10.21437/Interspeech.2018-46},
  url={http://dx.doi.org/10.21437/Interspeech.2018-46}
}