INTERSPEECH 2004 - ICSLP
We propose a novel method to estimate the quality of coded speech signals. The joint probability distribution of the subjective mean opinion score (MOS) and perceptual distortion feature variables is modelled using a Gaussian mixture density. The feature variables are sifted from a large pool of candidate features using statistical data mining techniques. We study what combinations of features and mixture model configuration are most effective. For our speech database, a five-feature, three-component GMM furnishes approximately 18% lower root-mean-squared MOS estimation error than ITU-T P.862 PESQ, the current best standard algorithm.
Bibliographic reference. Falk, Tiago / Chan, Wai-Yip / Kabal, Peter (2004): "Speech quality estimation using Gaussian mixture models", In INTERSPEECH-2004, 2013-2016.