The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013)

Stockholm, Sweden
August 21-23, 2013

Automatic Structural Metadata Identification based on multilayer prosodic information

Helena Moniz (1,2), Fernando Batista (1,3), Isabel Trancoso (1,4), Ana Isabel Mata (2)

(1) Spoken Language Systems Lab – INESC-ID, Lisbon, Portugal
(2) FLUL/CLUL, Universidae de Lisboa, Portugal
(3) ISCTE – Instituto Universitário de Lisboa, Portugal
(4) IST, Lisboa, Portugal

This paper discriminates different types of structural metadata in transcripts of university lectures: boundary events (comma, full stops and interrogatives), and disfluencies (repair). The disambiguation process is based on predefined multilayered linguistic information and on its hierarchical structure. Since boundary events may share similar linguistic properties, in terms of F0 and energy slopes, presence/absence of silent pauses, and duration of different units of analysis, different classification methods based on a set of automatically derived prosodic features have been applied to differentiate between those events and disfluencies. This paper also performs a detailed analysis on the impact of each individual feature in discriminating each structural event. The results of our data-driven approach allow us to reach a structured set of basic features towards the disambiguation of metadata events. These results are a step forward towards the analysis of speech acts and their disambiguation from disfluencies.

Index Terms: disfluencies, automatic speech processing, structural metadata, speech prosody

Full Paper

Bibliographic reference.  Moniz, Helena / Batista, Fernando / Trancoso, Isabel / Mata, Ana Isabel (2013): "Automatic structural metadata identification based on multilayer prosodic information", In DiSS-2013, 49-52.