We study lattice rescoring with knowledge scores for automatic speech recognition. Frame-based log likelihood ratio is adopted as a score measure of the goodness-of-fit between a speech segment and the knowledge sources. We evaluate our approach in two different applications: phone recognition, and connected digit continuous recognition. By incorporating knowledge scores obtained from 15 attribute detectors for place and manner of articulation, we reduced phone error rate from 40.52% to 35.16% using monophone models. The error rate can be further reduced to 33.42% for triphone models. The same lattice rescoring algorithm is extended to connected digit recognition using the TIDIGITS database, and without using any digit-specific training data. We observed the digit error rate can be effectively reduced to 4.03% from 4.54% which was obtained with the conventional Viterbi decoding algorithm with no knowledge scores.
Cite as: Siniscalchi, S.M., Li, J., Lee, C.-H. (2006) A study on lattice rescoring with knowledge scores for automatic speech recognition. Proc. Interspeech 2006, paper 1319-Mon3A2O.1, doi: 10.21437/Interspeech.2006-198
@inproceedings{siniscalchi06_interspeech, author={Sabato Marco Siniscalchi and Jinyu Li and Chin-Hui Lee}, title={{A study on lattice rescoring with knowledge scores for automatic speech recognition}}, year=2006, booktitle={Proc. Interspeech 2006}, pages={paper 1319-Mon3A2O.1}, doi={10.21437/Interspeech.2006-198} }