ISCA Archive ICSLP 2000
ISCA Archive ICSLP 2000

Stem-ML: language-independent prosody description

Greg P. Kochanski, Chilin Shih

Stem-ML is a tagging system with a completely defined algorithm for translating the tags into quantitative prosody in any language. It separates the description of prosodic intentions from their execution, by modeling the interactions between accents. We designed Stem-ML to allow automated training of accent shapes and parameters from acoustic databases.

Stem-ML is linguistically neutral: it allows a description of any physiologically realizable prosody in terms of linguistic concepts, without imposing a restrictive theory on the data. The tag set and algorithm make no assumptions about the number of distinct types of accents or tones, or their scope. Accents and tones are treated interchangeably. Stem-ML allows, but does not require, descriptions involving phrase curves.

The model begins with soft templates for tone or accent shapes that are specified by the user or obtained by automated training. These soft templates interact because of physically and physiologically motivated constraints that model the smooth and continuous motions of the muscles that control prosody.

Cite as: Kochanski, G.P., Shih, C. (2000) Stem-ML: language-independent prosody description. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 3, 239-242

  author={Greg P. Kochanski and Chilin Shih},
  title={{Stem-ML: language-independent prosody description}},
  booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)},
  pages={vol. 3, 239-242}