Disfluency in Spontaneous Speech (DiSS'01)

August 29-31, 2001
Edinburgh, Scotland, UK

Annotation and Analysis of Disfluencies in a Spontaneous Speech Corpus in Spanish

L. J. Rodríguez, I. Torres, A. Varona

Departamento de Electricidad y Electrónica, UPV/EHU, Bilbao, Spain

A new database consisting of 227 dialogues in Spanish was annotated with disfluencies. Then a detailed analysis of the annotations was carried out. The database had been recorded according to the well knownWizard of Oz paradigm. Seventy-five speakers were given each one three different scenarios to make queries about timetables, prices and other conditions of train travels between two spanish cities. The notion of disfluency was relaxed to include any acoustic, lexical or syntactic feature that distinguises spontaneous from read speech. A specific XML annotation scheme was developed. A simple text editor was used to insert marks, and a specific parser was implemented to find errors in annotations. The analysis of annotations revealed that disfluencies were not uniformly distributed among either user turns or speakers. Most disfluencies were grouped into certain user turns, especially the first one. On the other hand, some speakers were remarkably more prone to hesitate, repeat or correct fragments of speech than others.

Bibliographic reference.  Rodríguez, L. J. / Torres, I. / Varona, A. (2001): "Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish", In DISS'01, 1-4.