![]() |
Workshop on the Auditory Basis of Speech PerceptionKeele University, UK |
![]() |
For many tasks in speech signal processing it is of interest to develop an objective measure that correlates well with the perceptual distance between speech segments. (By speech segments we mean pieces of a speech signal of duration 50-150 milliseconds. For concreteness we will consider a segment to mean a diphone.) Such a distance metric would be useful for speech coding at low bit rates. Saving bits in those systems relies on a perceptual tolerance to acoustic deviations from the original speech, deviations that typically last for several tens of milliseconds. Such a distance metric would also be useful for automatic speech recognition on the assumption that perceptual invariance to adverse signal conditions (noise, microphone and channel distortions, room reverberations) and to phonemic variability (due to non-uniqueness of articulatory gestures) may provide a basis for robust performance. In this talk we will describe our attempts at defining such a metric.
Bibliographic reference. Ghitza, Oded / Sondhi, M. Mohan (1996): "On the perceptual distance between speech segments", In ABSP-1996, 141-143.