This paper reports on initial experiments to estimate the overall quality of synthesized speech transmitted over telephone channels, using a reference-free quality prediction model which is described in ITU-T Rec. P.563. Three tests have been carried out where naturally-produced and synthesized speech samples have been transmitted over various telephone channels, and then judged by test listeners with respect to their overall quality. The mean auditory ratings obtained in these tests have been compared to estimations provided by the P.563 model. Correlations between auditory and estimated quality scores vary considerably between experiments. It is concluded that the P.563 model mainly predicts the effects of the transmission channel, but not of the (naturally-produced or synthesized) source speech material.
Cite as: Möller, S., Heimansberg, J. (2006) Estimation of TTS quality in telephone environments using a reference-free quality prediction model. Proc. 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems (PQS 2006), 56-60
@inproceedings{moller06_pqs, author={Sebastian Möller and Johannes Heimansberg}, title={{Estimation of TTS quality in telephone environments using a reference-free quality prediction model}}, year=2006, booktitle={Proc. 2nd ISCA/DEGA Tutorial and Research Workshop on Perceptual Quality of Systems (PQS 2006)}, pages={56--60} }