This paper describes a system, based on statistical machine translation, that tries to remove from the output of an automatic audio transcription system non relevant words, such as: erroneously inserted functional words, filled pauses, interjections, word fragments, etc, as well as to repair, at a certain extent, ungrammatical pieces of sentences.
For this work we decided to concentrate on a political speeches application domain, due to the immediate availability of a parallel corpus of automatic audio transcriptions and related proceedings, manually produced.
The system can effectively detect and correct several errors (mainly insertions) included in the alignment between a given automatic audio transcription and a reference transcription derived from a corresponding proceeding.
Preliminary results, expressed in terms of word error rate, show that the proposed approach allows to improve of a relative 5% with respect to the usage of the pure automatic transcription of the audio.
Bibliographic reference. Falavigna, Daniele (2011): "Redundancy reduction in ASR of spontaneous speech through statistical machine translation", In INTERSPEECH-2011, 1417-1420.