Sixth International Conference on Spoken Language Processing
Stem-ML is a tagging system with a completely defined algorithm for translating the tags into quantitative prosody in any language. It separates the description of prosodic intentions from their execution, by modeling the interactions between accents. We designed Stem-ML to allow automated training of accent shapes and parameters from acoustic databases.
Stem-ML is linguistically neutral: it allows a description of any physiologically realizable prosody in terms of linguistic concepts, without imposing a restrictive theory on the data. The tag set and algorithm make no assumptions about the number of distinct types of accents or tones, or their scope. Accents and tones are treated interchangeably. Stem-ML allows, but does not require, descriptions involving phrase curves.
The model begins with soft templates for tone or accent shapes that are specified by the user or obtained by automated training. These soft templates interact because of physically and physiologically motivated constraints that model the smooth and continuous motions of the muscles that control prosody.
Bibliographic reference. Kochanski, Greg P. / Shih, Chilin (2000): "Stem-ML: language-independent prosody description", In ICSLP-2000, vol.3, 239-242.