Previously, we proposed a flexible two-layered speech recogniser architecture, called FLaVoR. In the first layer an unconstrained, task independent phone recogniser generates a phone lattice. Only in the second layer the task specific lexicon and language model are applied to decode the phone lattice and produce a word level recognition result. In this paper, we present a further evaluation of the FLaVoR architecture. The performance of a classical singlelayered architecture and the FLaVoR architecture are compared on two recognition tasks, using the same acoustic, lexical and language models. On the large vocabulary Wall Street Journal 5k and 20k benchmark tasks, the two-layered architecture resulted in slightly but not significantly better word error rates. On a reading error detection task for a reading tutor for children, the FLaVoR architecture clearly outperformed the single-layered architecture.
Cite as: Duchateau, J., Demuynck, K., Van hamme, H. (2009) Evaluation of phone lattice based speech decoding. Proc. Interspeech 2009, 1179-1182, doi: 10.21437/Interspeech.2009-342
@inproceedings{duchateau09_interspeech, author={Jacques Duchateau and Kris Demuynck and Hugo {Van hamme}}, title={{Evaluation of phone lattice based speech decoding}}, year=2009, booktitle={Proc. Interspeech 2009}, pages={1179--1182}, doi={10.21437/Interspeech.2009-342} }