International Workshop on Spoken Language Translation (IWSLT) 2004
Keihanna Science City, Kyoto, Japan
Feature selection is critical to the performance of maximumentropy- based statistical concept-based spoken language translation. The source language spoken message is first parsed into a structured conceptual tree, and then generated into the target language based on maximum entropy modeling. To improve feature selection in this maximum entropy approach, a new concept- word feature is proposed, which exploits both concept-level and word-level information. It thus enables the design of concise yet informative concept sets and easies both annotation and parsing efforts. The concept generation error rate is reduced by over 90% on training set and 7% on test set in our speech translation corpus within limited domains. To alleviate data sparseness problem, multiple feature sets are proposed and employed, which achieves 10%-14% further error rate reduction. Improvements are also achieved in our experiments on speech-to-speech translation.
Full Paper Presentation
Bibliographic reference. Gu, Liang / Gao, Yuqing (2004): "On feature selection in maximum entropy approach to statistical concept-based speech-to-speech translation", In IWSLT-2004, 115-121.