The work reported in this paper was the result of the need to label a large corpus of spontaneous, task-oriented dialogue with prosodic prominences. A computational model using only word duration, part of speech and a dictionary lookup of each word's canonical phonemic contents was trained against the results of a human coder marking prominence. Because word durations were normalised, it was possible to set a common threshold for all members of a form class above which the lexically stressed syllables were classed as prominent. The method used is presented and the relative importance of duration information, phonemic contents, syllabic context and part of speech information is explored. The automatic coder was validated against unseen material and achieved a 58% agreement with a human coder. Further investigation showed that three humans coders agreed no better with each other than each agreed with the computational model. Thus, although the automatic system did not conform very well to the performance of any one human coder, it conformed as well as another human coder might.
Cite as: Aylett, M., Bull, M. (1998) The automatic marking of prominence in spontaneous speech using duration and part of speech information. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0825, doi: 10.21437/ICSLP.1998-107
@inproceedings{aylett98_icslp, author={Matthew Aylett and Matthew Bull}, title={{The automatic marking of prominence in spontaneous speech using duration and part of speech information}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0825}, doi={10.21437/ICSLP.1998-107} }