![]() |
Speech Recognition and Intrinsic Variation (SRIV2006)Toulouse, France |
![]() |
In this paper, a novel architecture is proposed for the speech recognition component in a reading tutor. Decoding starts with an unconstrained phoneme recogniser that produces a phoneme lattice. Next, the best path in the lattice is looked for based on a phoneme level finite state transducer that models the words in the sentence to be read and that includes solutions for expected reading miscues and for unexpected events and disfluencies. An advantage of the architecture is its modularity as the first module is a generic phoneme recogniser while the second contains all task specific information. Moreover, the intermediate phoneme lattice adds flexibility to the system as lattice re-scoring allows, at an early stage of recognition, the incorporation of elaborate acoustic features that don't fit in a typical HMM-based recogniser, for instance segment based features. Experiments with the proposed system show favorable reading miscue detection and false alarm rates compared to the state-of-the-art systems described in the literature. In addition we introduce an efficient VTLN system that avoids delays in the recognition which would be incompatible with the immediate feedback often needed in a reading tutor. Using the VTLN, the acoustic modelling for children between 5 and 11 years old could be improved considerably.
Bibliographic reference. Duchateau, Jacques / Wigham, Mari / Demuynck, Kris / Hamme, Hugo van (2006): "A flexible recogniser architecture in a reading tutor for children", In SRIV-2006, 59-64.