Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric

Ryandhimas E. Zezario, Szu-Wei Fu, Xugang Lu, Hsin-Min Wang, Yu Tsao


Previous studies have shown that a specialized speech enhancement model can outperform a general model when the test condition is matched to the training condition. Therefore, choosing the correct (matched) candidate model from a set of ensemble models is critical to achieve generalizability. Although the best decision criterion should be based directly on the evaluation metric, the need for a clean reference makes it impractical for employment. In this paper, we propose a novel specialized speech enhancement model selection (SSEMS) approach that applies a non-intrusive quality estimation model, termed Quality-Net, to solve this problem. Experimental results first confirm the effectiveness of the proposed SSEMS approach. Moreover, we observe that the correctness of Quality-Net in choosing the most suitable model increases as input noisy SNR increases, and thus the results of the proposed systems outperform another auto-encoder-based model selection and a general model, particularly under high SNR conditions.


 DOI: 10.21437/Interspeech.2019-2425

Cite as: Zezario, R.E., Fu, S., Lu, X., Wang, H., Tsao, Y. (2019) Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric. Proc. Interspeech 2019, 3168-3172, DOI: 10.21437/Interspeech.2019-2425.


@inproceedings{Zezario2019,
  author={Ryandhimas E. Zezario and Szu-Wei Fu and Xugang Lu and Hsin-Min Wang and Yu Tsao},
  title={{Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={3168--3172},
  doi={10.21437/Interspeech.2019-2425},
  url={http://dx.doi.org/10.21437/Interspeech.2019-2425}
}