Workshop on the Auditory Basis of Speech Perception

Keele University, UK
July 15-19, 1996

On the Perceptual Distance Between Speech Segments

Oded Ghitza, M. Mohan Sondhi

Lucent Technologies Bell Laboratories, Acoustics Research Department, Murray Hill, New Jersey, USA

For many tasks in speech signal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (By speech segments we mean pieces of a speech signal of duration 50-150 milliseconds. For concreteness we will consider a segment to mean a diphone.) Such a distance metric would be useful for speech coding at low bit rates. Saving bits in those systems relies on a perceptual tolerance to acoustic deviations from the original speech, deviations that typically last for several tens of milliseconds. Such a distance metric would also be useful for automatic speech recognition on the assumption that perceptual invariance to adverse signal conditions (noise, microphone and channel distortions, room reverberations) and to phonemic variability (due to non-uniqueness of articulatory gestures) may provide a basis for robust performance. In this talk we will describe our attempts at defining such a metric.

Full Paper

Bibliographic reference.  Ghitza, Oded / Sondhi, M. Mohan (1996): "On the perceptual distance between speech segments", In ABSP-1996, 141-143.