Debates in the European Parliament are simultaneously translated into the official languages of the Union. These interpretations are broadcast live via satellite on separate audio channels. After several months, the parliamentary proceedings are published as final text editions (FTE). FTEs are formatted for an easy readability and can differ significantly from the original speeches and the live broadcast interpretations. We examine the impact on German word error rate (WER) when introducing supervision based on German FTEs and supervision based on German automatic translations extracted from the English and Spanish audio. We show that FTE based supervision and additional interpretation based supervision provide significant reductions in WER. We successfully apply FTE supervised acoustic model (AM) training using 143h of recordings. Combining the new AM with the mentioned supervision techniques, we achieve a significant WER reduction of 13.3% relative.
Bibliographic reference. Paulik, Matthias / Waibel, Alex (2008): "Lightly supervised acoustic model training on EPPS recordings", In INTERSPEECH-2008, 224-227.