BBN's 20 times real-time (20xRT) Mandarin conversational telephone speech (CTS) recognition system achieved the lowest character error rate (CER) in the Rich Transcription 2004 fall (RT04F) evaluation conducted by NIST. This paper focuses on the work we have done after the evaluation. The work includes porting of more new acoustic modeling technologies we had developed on English, such as long-span features, a modified HLDA-SAT, etc., diagnoses of the problems we had encountered in the evaluation, such as problems in pitch, silence chopping and automatic segmentation, and solutions we found for the problems. With all these new technologies and problem solutions incorporated and a new design of the 20xRT system architecture we achieved a 2.1% absolute reduction in CER on the RT04 evaluation test set.
Cite as: Ma, J.Z., Matsoukas, S. (2005) Improvements to the BBN RT04 Mandarin conversational telephone speech recognition system. Proc. Interspeech 2005, 1625-1628, doi: 10.21437/Interspeech.2005-534
@inproceedings{ma05c_interspeech, author={Jeff Z. Ma and Spyros Matsoukas}, title={{Improvements to the BBN RT04 Mandarin conversational telephone speech recognition system}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={1625--1628}, doi={10.21437/Interspeech.2005-534} }