Automatic recognition of spontaneous speech dialogues

Mauro Cettolo, Daniele Falavigna

The work reported in the paper will concern the assessment of a set of modifications applied to the continuous speech recognizer developed at IRST for dictation tasks. The objective of the proposed modifications is to improve the recognizer performance on a corpus of human-human dialogues, spontaneously uttered. Some solutions will be given to increase Automatic Speech Recognition (ASR) robustness with respect to typical spontaneous speech phenomena such as: breaths, coughs, filled and silent pauses and speaking rate variations. Both gender independent and dependent models are used. Specific models of extra-linguistic phenomena are trained and a method for coping with speaking rate variations will be proposed. Different recognizers, corresponding to males, females and to various speaking rate factors, are combined together so as a unique search space is defined. Best performance, obtained on a corpus of hundreds of person-to-person spontaneously uttered dialogues, gives 26.1% Word Error Rate (WER).

