ISCA Archive Interspeech 2021
ISCA Archive Interspeech 2021

QISTA-Net-Audio: Audio Super-Resolution via Non-Convex ℓ_q-Norm Minimization

Gang-Xuan Lin, Shih-Wei Hu, Yen-Ju Lu, Yu Tsao, Chun-Shien Lu

Audio super-resolution (ASR) aims to reconstruct the high-resolution signal from its corresponding low-resolution one, which is hard while the correlation between them is low.

In this paper, we propose a learning model, QISTA-Net-Audio, to solve ASR in a paradigm of linear inverse problem. QISTA-Net-Audio is composed of two components. First, an audio waveform can be presented as a complex-valued spectrum, which is composed of a real and an imaginary part, in the frequency domain. We treat the real and imaginary parts as an image, and predict a high-resolution spectrum but only keep the phase information from the viewpoint of image reconstruction. Second, we predict the magnitude information by solving the sparse signal reconstruction problem. By combining the predicted magnitude and the phase together, we can recover the high-resolution waveform. Comparison with the state-of-the-art method MfNet [1], in terms of measure metrics SNR, PESQ, and STOI, demonstrates the superior performance of our method.


doi: 10.21437/Interspeech.2021-670

Cite as: Lin, G.-X., Hu, S.-W., Lu, Y.-J., Tsao, Y., Lu, C.-S. (2021) QISTA-Net-Audio: Audio Super-Resolution via Non-Convex ℓ_q-Norm Minimization. Proc. Interspeech 2021, 1639-1643, doi: 10.21437/Interspeech.2021-670

@inproceedings{lin21c_interspeech,
  author={Gang-Xuan Lin and Shih-Wei Hu and Yen-Ju Lu and Yu Tsao and Chun-Shien Lu},
  title={{QISTA-Net-Audio: Audio Super-Resolution via Non-Convex ℓ_q-Norm Minimization}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1639--1643},
  doi={10.21437/Interspeech.2021-670}
}