Speech Prosody 2004
Automatic prosody recognition (APR) is of fundamental importance for automatic speech understanding. In this paper, we propose a maximum likelihood prosody recognizer consisting of a GMM-based acoustic model that models the distribution of the phone-level acoustic-prosodic observations (pitch, duration and energy) and an ANN-based language model that models the word-level stochastic dependence between prosody and syntax. Our experiments on the Radio News Corpus show that our recognizer is able to achieve 84% pitch accent recognition accuracy and 93% intonational phrase boundary (IPB) recognition accuracy in a leave-one-speaker-out task which has exceeded previous reported results on the same corpus. The same recognizer is tested on a subset of Switchboard corpus. The accuracies are degraded but still significantly better than the chance levels.
Bibliographic reference. Chen, Ken / Hasegawa-Johnson, Mark / Cohen, Aaron / Cole, Jennifer (2004): "A maximum likelihood prosody recognizer", In SP-2004, 509-512.