12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Redundancy Reduction in ASR of Spontaneous Speech Through Statistical Machine Translation

Daniele Falavigna

FBK-irst, Italy

This paper describes a system, based on statistical machine translation, that tries to remove from the output of an automatic audio transcription system non relevant words, such as: erroneously inserted functional words, filled pauses, interjections, word fragments, etc, as well as to repair, at a certain extent, ungrammatical pieces of sentences.

For this work we decided to concentrate on a political speeches application domain, due to the immediate availability of a parallel corpus of automatic audio transcriptions and related proceedings, manually produced.

The system can effectively detect and correct several errors (mainly insertions) included in the alignment between a given automatic audio transcription and a reference transcription derived from a corresponding proceeding.

Preliminary results, expressed in terms of word error rate, show that the proposed approach allows to improve of a relative 5% with respect to the usage of the pure automatic transcription of the audio.

Full Paper

Bibliographic reference.  Falavigna, Daniele (2011): "Redundancy reduction in ASR of spontaneous speech through statistical machine translation", In INTERSPEECH-2011, 1417-1420.