The three main knowledge sources used in the automatic speech recognition (ASR), namely the acoustic models, a dictionary and a language model, are usually designed and optimized in isolation. Our previous work [1] proposed a methodology for jointly tuning these parameters, based on the integration of the resources as a finite-state graph, whose transition weights are trained discriminatively. This paper extends the training framework to a large vocabulary task, the automatic transcription of French broadcast news. We propose several fast decoding techniques to make the training practical. Experiments show that a reduction of 1% absolute of word error rate (WER) can be obtained. We conclude the paper with an appraisal of the potential of this approach on large vocabulary ASR tasks.
Cite as: Lin, S.-S., Yvon, F. (2007) Optimization on decoding graphs by discriminative training. Proc. Interspeech 2007, 1737-1740, doi: 10.21437/Interspeech.2007-487
@inproceedings{lin07c_interspeech, author={Shiuan-Sung Lin and François Yvon}, title={{Optimization on decoding graphs by discriminative training}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={1737--1740}, doi={10.21437/Interspeech.2007-487} }