4th International Conference on Spoken Language Processing
Philadelphia, PA, USA
Prosody is an important aspect of speech that current text to speech synthesis systems fail to mimic in a convincing or natural way [1,2,3,4]. This paper describes research on a partial system for prosodic synthesis using easily derived low level syntactic information. A computer program has been developed that can annotate unseen text with prosodic stress and tone marks using the sequence of part of speech tags previously assigned to each word by a tagging system. Training and testing material was taken from the Lancaster/IBM Spoken English Corpus (SEC). Co-occurrence measures were calculated relating stress and tone mark annotations to the word class annotation information. A model was developed around the statistical information which calculates a score for all possible mappings between a given part of speech sequence and all the potential stress/tone annotations. The highest scoring pattern is selected as that which is the most likely \baseline" annotation, according to the model. Performance figures attain up to 91% agreement with the original corpus annotations.
Bibliographic reference. Arnfield, Simon (1996): "Word class driven synthesis of prosodic annotations", In ICSLP-1996, 1978-1980.