ISCA Archive ICSLP 1994
ISCA Archive ICSLP 1994

On the perceptual distance between speech segments

Oded Ghitza, M. Mohan Sondhi

For many tasks in speech signal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (By speech segments we mean pieces of a speech signal, of duration 50-200 milliseconds. For concreteness we will consider a segment to mean a diphone.) Such a distance metric would be useful for low bit rate speech coders because perturbations introduced by such coders typically last for several tens of milliseconds. It would also be useful for automatic speech recognition on the assumption that mimicking human behavior will improve recognition performance. Yet a third use for such a metric would be to define a just noticeable difference for diphones (a "phonemic" JND). (If a diphone is perturbed, how far from the original must the perturbed diphone be, in order to be perceived as a different diphone?) In this talk we will describe our attempts at defining such a metric.


Cite as: Ghitza, O., Sondhi, M.M. (1994) On the perceptual distance between speech segments. Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994), 499-502

@inproceedings{ghitza94_icslp,
  author={Oded Ghitza and M. Mohan Sondhi},
  title={{On the perceptual distance between speech segments}},
  year=1994,
  booktitle={Proc. 3rd International Conference on Spoken Language Processing (ICSLP 1994)},
  pages={499--502}
}