The 6th Workshop on Disfluency in Spontaneous Speech (DiSS 2013)

Stockholm, Sweden
August 21-23, 2013

HESITA(tions) in Portuguese: A Database

Sara Candeias (1), Dirce Celorico (1), Jorge Proença (1), Arlindo Veiga (1,2), Fernando Perdigão (1,2)

(1) Instituto de Telecomunicações, Coimbra, Portugal
(2) Electrical and Computer Engineering Departement, University of Coimbra, Portugal

With this paper we present a European Portuguese database of hesitations in speech. Under the name of HESITA, this database contains annotations of hesitation events, such as filled pauses, vocalic extensions, truncated words, repetitions and substitutions. The hesitations were found over 30 daily news programs collected from podcasts of a Portuguese television channel. The database also includes speaking style classification as well as acoustical information and other speech events. Statistic analysis of the hesitation events in terms of their occurrence is presented. Insights into the process of human speech communication can be extracted from this database, which encloses relevant information about how Portuguese speakers hesitate. The HESITA database is freely available online to the research community.

Index Terms: hesitations, disfluency, prepared speech, spontaneous speech, annotation, hesitation corpus

Full Paper

Bibliographic reference.  Candeias, Sara / Celorico, Dirce / Proença, Jorge / Veiga, Arlindo / Perdigão, Fernando (2013): "HESITA(tions) in Portuguese: a database", In DiSS-2013, 13-16.