ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Development of SRI's translation systems for broadcast news and broadcast conversations

Jing Zheng, Wen Wang, Necip Fazil Ayan

We present our recent work on developing large-vocabulary Arabic-to-English and Chinese-to-English speech-to-text translation systems for the January 2008 Global Autonomous Language Exploitation (GALE) retest evaluation. Two audio genres were involved in the evaluation: broadcast news and broadcast conversation. Our system, following the hierarchical phrase-based translation approach, has a two-pass decoding strategy, with the first-pass integrated search generating 3000 unique n-best lists, which are then reranked by several different language models in the second pass.

We emphasize our work on adapting the system, which was mostly trained on text data, to the speech genres, including number tokenization, punctuation compensation, and various optimization techniques. We present our results on several different tuning and testing data sets used for system development.


doi: 10.21437/Interspeech.2008-598

Cite as: Zheng, J., Wang, W., Ayan, N.F. (2008) Development of SRI's translation systems for broadcast news and broadcast conversations. Proc. Interspeech 2008, 2346-2349, doi: 10.21437/Interspeech.2008-598

@inproceedings{zheng08b_interspeech,
  author={Jing Zheng and Wen Wang and Necip Fazil Ayan},
  title={{Development of SRI's translation systems for broadcast news and broadcast conversations}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={2346--2349},
  doi={10.21437/Interspeech.2008-598}
}