A new database consisting of 227 dialogues in Spanish was annotated with disfluencies. Then a detailed analysis of the annotations was carried out. The database had been recorded according to the well known Wizard of Oz paradigm. Seventy-five speakers were given each one three different scenarios to make queries about timetables, prices and other conditions of train travels between two spanish cities. The notion of disfluency was relaxed to include any acoustic, lexical or syntactic feature that distinguises spontaneous from read speech. A specific XML annotation scheme was developed. A simple text editor was used to insert marks, and a specific parser was implemented to find errors in annotations. The analysis of annotations revealed that disfluencies were not uniformly distributed among either user turns or speakers. Most disfluencies were grouped into certain user turns, especially the first one. On the other hand, some speakers were remarkably more prone to hesitate, repeat or correct fragments of speech than others.
Cite as: Rodríguez, L.J., Torres, I., Varona, A. (2001) Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish. Proc. ITRW on Disfluency in Spontaneous Speech (DiSS 2001), 1-4
@inproceedings{rodriguez01_diss, author={L. J. Rodríguez and I. Torres and A. Varona}, title={{Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish}}, year=2001, booktitle={Proc. ITRW on Disfluency in Spontaneous Speech (DiSS 2001)}, pages={1--4} }