7th International Conference on Spoken Language Processing

September 16-20, 2002
Denver, Colorado, USA

Time-Compressing Natural and Synthetic Speech

Esther Janse

University of Utrecht, The Netherlands

Phoneme detection is a useful tool to compare the perception of perfectly intelligible speech types. As previous research suggests that perception of fast speech is helped by segmental redundancy, we expected the hyperarticulation of synthetic speech to turn into an advantage at a fast rate. Consequently, the processing advantage of natural over synthetic speech was expected to decrease after timecompression. Secondly, detection times were expected to be slower after moderate time-compression because of the higher processing difficulty of fast speech. However, detection times tended to become shorter in the time-compressed condition. This was attributed to shorter durations of syllables and words. Furthermore, the processing advantage of natural over synthetic speech did not decrease, but rather tended to increase. This may be explained by the lack of a speaking effort pattern in synthetic diphone speech, which makes it rather blurred at faster playback rates.

Bibliographic reference.  Janse, Esther (2002): "Time-compressing natural and synthetic speech", In ICSLP-2002, 1645-1648.