5th International Conference on Spoken Language Processing
This contribution describes a method for the automatic prosodic labeling of multi-lingual speech data. The automatic labeler assigns a boundary strength between 0 and 3 to each word boundary, and a word prominence between 0 and 9 to each word. The speech signal and its orthographic representation are first transformed to feature vectors comprising acoustic and linguistic features such as pitch, duration, energy, part-of-speech, punctuation, word frequency and stress. Next, the feature vectors are mapped to prosodic labels via a cascade of multi-layer perceptrons. Experiments on 6 different languages demonstrate that combining acoustic with linguistic features yields a better performance than obtainable on the basis of acoustic features alone. We also present experiments in which we assess the influence of the quality of the underlying phonetic segmentation and labeling on the prosodic labeling performance.
Bibliographic reference. Vereecken, Halewijn / Martens, Jean-Pierre / Grover, Cynthia / Fackrell, Justin / Coile, Bert Van (1998): "Automatic prosodic labeling of 6 languages", In ICSLP-1998, paper 0045.