International Workshop on Spoken Language Translation (IWSLT) 2004

Keihanna Science City, Kyoto, Japan
September 30-October 1, 2004

Towards Named Entity Extraction and Translation in Spoken Language Translation

Fei Huang, Stephan Vogel, Alex Waibel

Language Technologies Institute, School of Computer Sciences, Carnegie Mellon University, Pittsburgh, PA, USA

In this paper we propose a new method of detecting and translating named entities (NE) from spoken language, e.g., Chinese broadcast news. This approach detects possible NE regions from less reliably recognized hypotheses using confidence measures. Each possible NE boundary within the region is compared with candidate NEs from retrieved documents based on their acoustic similarities and semantic correlations. These candidate NEs are re-ranked bv additionally incorporating general and topic-specific language models to measure the NE context consistency. This approach, combined with the HMM-based NE extraction on confidently recognized words, improves NE extraction F-score from 66% to 71% and NE translation quality from 69% to 73% over the baseline method. Systematic comparisons on NE translation quality with different speech input quality are also presented.

Full Paper   

Bibliographic reference.  Huang, Fei / Vogel, Stephan / Waibel, Alex (2004): "Towards named entity extraction and translation in spoken language translation", In IWSLT-2004, 131-137.