Audio super-resolution (ASR) aims to reconstruct the high-resolution
signal from its corresponding low-resolution one, which is hard while
the correlation between them is low.
In this paper, we
propose a learning model, QISTA-Net-Audio, to solve ASR in a paradigm
of linear inverse problem. QISTA-Net-Audio is composed of two components.
First, an audio waveform can be presented as a complex-valued spectrum,
which is composed of a real and an imaginary part, in the frequency
domain. We treat the real and imaginary parts as an image, and predict
a high-resolution spectrum but only keep the phase information from
the viewpoint of image reconstruction. Second, we predict the magnitude
information by solving the sparse signal reconstruction problem. By
combining the predicted magnitude and the phase together, we can recover
the high-resolution waveform. Comparison with the state-of-the-art
method MfNet [1], in terms of measure metrics SNR, PESQ, and STOI,
demonstrates the superior performance of our method.
Cite as: Lin, G.-X., Hu, S.-W., Lu, Y.-J., Tsao, Y., Lu, C.-S. (2021) QISTA-Net-Audio: Audio Super-Resolution via Non-Convex ℓ_q-Norm Minimization. Proc. Interspeech 2021, 1639-1643, doi: 10.21437/Interspeech.2021-670
@inproceedings{lin21c_interspeech, author={Gang-Xuan Lin and Shih-Wei Hu and Yen-Ju Lu and Yu Tsao and Chun-Shien Lu}, title={{QISTA-Net-Audio: Audio Super-Resolution via Non-Convex ℓ_q-Norm Minimization}}, year=2021, booktitle={Proc. Interspeech 2021}, pages={1639--1643}, doi={10.21437/Interspeech.2021-670} }