Third International Conference on Spoken Language Processing (ICSLP 94)

Yokohama, Japan
September 18-22, 1994

A Computational Model of Prosody Perception

Neil P. McAngus Todd (1), Guy J. Brown (2)

(1) Department of Music, University of Sheffield, UK
(2) Department of Computer Science, University of Sheffield, UK

This paper describes a computational model of auditory rhythm perception, and demonstrates its application to the extraction of prosodic information from spoken language. The model consists of three stages. In the first stage, the speech waveform is processed by a simulation of the auditory periphery. Secondly, the output of the auditory periphery is processed by a multiscale filtering mechanism, analogous to a short-term auditory memory. Finally, peaks in the response of the multiscale mechanism are accumulated in a long-term auditory store, and plotted to give a representation referred to as a rhythmogram. It is demonstrated that there is a close relationship between the rhythmogram of an utterance and its corresponding stress hierarchy derived by phonological analysis.

Full Paper

Bibliographic reference.  Todd, Neil P. McAngus / Brown, Guy J. (1994): "A computational model of prosody perception", In ICSLP-1994, 127-130.