Third European Conference on Speech Communication and Technology

Berlin, Germany
September 22-25, 1993


Cluster-Similarity: A Useful Database for Speech Processing

Ute Jekosch

Lehrstuhl für allgemeine Elektrotechnik und Akustik, Ruhr-Universität Bochum, Bochum, Germany

People working on spoken language technology are, amongst other things, confronted with the multiformity of the surface phenomenon speech. Automatic speech recognition is extremely difficult because there is a strong inter- and intra-speaker variability. People speak different in different situations. Developers of speech synthesis systems try to map that multiformity of natural speech onto automatically generalized speech. For both fields, speech recognition and speech synthesis, it is of central importance to understand the underlying principles of speech production and perception. Different scientific disciplines have contributed to collect information on this issue. In this paper a psycho-acoustic approach is described. Similarity profiles representing spaces of perceptual distinction are presented: Profile A is based on judgements gained in an introspective way, Profile B visualizes judgements on natural speech, and Profile C on synthetic speech. The study shows that there are quite severe differences compared to natural speech. The paper will concentrate on describing the perceptual dimensional representations of natural and synthetic speech. Data are compared and interpreted with regard to their role in synthesis assessment. A detailed analysis of test results will give some indications of why speech synthesizers often suffer from intelligibility and acceptability.

Full Paper

Bibliographic reference.  Jekosch, Ute (1993): "Cluster-similarity: a useful database for speech processing", In EUROSPEECH'93, 195-198.