Speech communication channels and their components (e.g. codecs) are generally designed for optimum perceived speech quality. However, transmission channels should also preserve principal speaker-specific characteristics that enable acceptable speaker identification performance by end listeners. This paper proposes a first step towards effective approaches for the prediction of the human speaker identification performance from instrumental quality measures. Correspondences between speech quality and speaker identification accuracy are shown by fitting linear curves to data points involving different channel transmissions. Narrowband, wideband, and super-wideband channels are considered, with other typically associated distortions. Our analyses show that Coloration, one of the perceptual quality dimensions, can be a better predictor of the human speaker identification performance than overall quality predictions in terms of Mean Opinion Scores. This suggests that the speaker-specific properties of the voice are mainly impaired by the distortion of frequency components in the transmission path.
Bibliographic reference. Gallardo, Laura Fernández / Möller, Sebastian (2015): "Towards the prediction of human speaker identification performance from measured speech quality", In INTERSPEECH-2015, 443-447.