Automatic prominence or pitch accent detection is important as it can perform automatic prosodic annotation of speech corpora, as well as provide additional features in other tasks such as keyword detection. In this paper, we evaluate how accent detection performance changes according to different base units and what kind of boundary information is available. We compare word, syllable, and vowel-based units when their boundaries are provided. We also automatically estimate syllable boundaries using energy contours when phone-level alignment is available. In addition, we utilize a sliding window with fixed length under the condition of unknown boundaries. Our experiments show that when boundary information is available, using longer base unit achieves better performance. In the case of no boundary information, using a moving window with a fixed size achieves similar performance to using syllable information on word-level evaluation, suggesting that accent detection can be performed without relying on a speech recognizer to generate boundaries.
Bibliographic reference. Jeon, Je Hun / Liu, Yang (2009): "Automatic accent detection: effect of base units and boundary information", In INTERSPEECH-2009, 180-183.