Third International Conference on Spoken Language Processing (ICSLP 94)
This paper reports an objective measure for assessing low-bit-rate coded speech. A model for this objective measure, in which several known features of the perceptual processing of speech sounds by the human ear are emulated, is based on the Hertz-to-Bark transformation, on critical-band filtering with a preemphasis for boosting higher frequencies, on nonlinear conversion for subjective loudness, and on temporal (forward) masking process. The Bark spectral distortion rating (BSDR) is computed for each 10-20 ms segment of the original and coded speech. The effectiveness of this measure was validated by regression analysis between the computed BSDR values and subjective MOS ratings obtained for a large number of utterances coded by several versions of a CELP coder and a VSELP coder under three degraded conditions: input speech levels, transmission error rates, and background noise levels. The BSDR values correspond better to MOS ratings than several commonly used measures. As a result, BSDR can be used to accurately predict subjective scores.
Bibliographic reference. Watanabe, Toshiro / Hayashi, Shinji (1994): "An objective measure for qualitatively assessing low-bit-rate coded speech", In ICSLP-1994, 1331-1334.