Survey Talk: Prosody Research and Applications: The State of the Art

Nigel G. Ward


Prosody is essential in human interaction and relevant to every area of speech science and technology. Our understanding of prosody, although still fragmentary, is rapidly advancing. This survey will give non-specialists the knowledge needed to decide whether and how to integrate prosodic information into their models and systems. It will start with the basics: the paralinguistic, phonological and pragmatic functions of prosody, its physiology and perception, commonly and less-commonly-used prosodic features, and the three main approaches to modeling prosody. Regarding practical applications, it will overview ways to use prosody in speech recognition, speech synthesis, dialog systems, and the inference of speaker states and traits. Recent trends will then be presented, including modeling pitch as more than a single scalar value, modeling prosody beyond just intonation, representing prosodic knowledge with constructions of multiple prosodic features in specific temporal configurations, modeling observed prosody as the result of the superposition of patterns representing independent intents, modeling multi-speaker phenomenon, and the use of unsupervised methods. Finally, we will consider remaining challenges in research and applications.


Cite as: Ward, N.G. (2019) Survey Talk: Prosody Research and Applications: The State of the Art. Proc. Interspeech 2019.


@inproceedings{Ward2019,
  author={Nigel G. Ward},
  title={{Survey Talk: Prosody Research and Applications: The State of the Art}},
  year=2019,
  booktitle={Proc. Interspeech 2019}
}