Sixth European Conference on Speech Communication and Technology
In speech recognition systems, it is generally required that the training environment be identical to the decoding environment. Any mismatch between them may result in performance degradation. This paper tries to improve the performance of a speech recognition system by compensating for the training and decoding mismatches. The baseline system  is a multiple pass decoding system capable of transcribing broadcast news, which achieved 30.5% word error rate on the 1997 DARPA HUB4E test set. Three approaches were investigated: (1) Delete long silence in both training and decoding utterances; (2) Enlarge the second-pass decoding dictionary; (3) Merge utterance fragments into a complete sentence. These approaches resulted in 2.8%, 0.3%, and 2.3% absolute error reductions on the 1997 test set, respectively. The combined approach achieved more than 4% absolute error reduction. On the oAEcial 1998 DARPA HUB4E evaluation, the resulting system achieved 27.9% word error rate for the 97 part evaluation data and 23.6% word error rate for the 98 part evaluation data.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Wu, Xintian / Yan, Yonghong (1999): "Development of the 1998 OGI-FONIX broadcast news transcription system", In EUROSPEECH'99, 683-686.