In this paper, we explore the differences between direct and symbolic sequential modeling of prosody. We use sequential models to characterize speech in two tasks, classifying speaking-style and distinguishing native from non-native speech. We explore the use of a spike-and-slab model to directly model pitch contour data. We find in both of these tasks that sequences of symbolic prosodic events to lead to improved performance over approaches that model pitch contours directly. We also explore the use of hypothesized prosodic events in both tasks. We find the speaking-style results to be robust to automatic annotation, while, when classifying nativeness, the spike-and-slab model leads to better performance.
Bibliographic reference. Rosenberg, Andrew (2011): "Symbolic and direct sequential modeling of prosody for classification of speaking-style and nativeness", In INTERSPEECH-2011, 1065-1068.