INTERSPEECH 2012
13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Parallel Combination of Multilingual Speech Streams for Improved ASR

João Miranda (1,2), João Paulo da Silva Neto (1), Alan W. Black (2)

(1) INESC-ID / Instituto Superior Técnico, Lisboa, Portugal
(2) School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

In a growing number of applications, such as simultaneous interpretation, audio or text may be available conveying the same information in different languages. These different views contain redundant information that can be explored to enhance the performance of speech and language processing applications. We propose a method that directly integrates ASR word graphs or lattices and phrase tables from an SMT system to combine such parallel speech data and improve ASR performance. We apply this technique to speeches from four European Parliament committees and obtain a 16.6% relative improvement (20.8% after a second iteration) in WER, when Portuguese and Spanish interpreted versions are combined with the original English speeches. Our results indicate that further improvements may be possible by including additional languages.

Index Terms: multistream combination, speech recognition, machine translation

Full Paper

Bibliographic reference.  Miranda, João / Neto, João Paulo da Silva / Black, Alan W. (2012): "Parallel combination of multilingual speech streams for improved ASR", In INTERSPEECH-2012, 1027-1030.