International Workshop on Spoken Language Translation (IWSLT) 2004

Keihanna Science City, Kyoto, Japan
September 30-October 1, 2004

On Feature Selection in Maximum Entropy Approach to Statistical Concept-based Speech-to-Speech Translation

Liang Gu, Yuqing Gao

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

Feature selection is critical to the performance of maximumentropy- based statistical concept-based spoken language translation. The source language spoken message is first parsed into a structured conceptual tree, and then generated into the target language based on maximum entropy modeling. To improve feature selection in this maximum entropy approach, a new concept- word feature is proposed, which exploits both concept-level and word-level information. It thus enables the design of concise yet informative concept sets and easies both annotation and parsing efforts. The concept generation error rate is reduced by over 90% on training set and 7% on test set in our speech translation corpus within limited domains. To alleviate data sparseness problem, multiple feature sets are proposed and employed, which achieves 10%-14% further error rate reduction. Improvements are also achieved in our experiments on speech-to-speech translation.

Full Paper    Presentation

Bibliographic reference.  Gu, Liang / Gao, Yuqing (2004): "On feature selection in maximum entropy approach to statistical concept-based speech-to-speech translation", In IWSLT-2004, 115-121.