7th International Conference on Spoken Language Processing
September 16-20, 2002
Recently, speech recognition systems based on articulatory features such as "voicing" or the position of lips and tongue have gained interest, because they promise advantages with respect to robustness and permit new adaptation methods to compensate for channel, noise, and speaker variability. These approaches are also interesting from a general point of view, because their models use phonological and phonetic concepts, which allow for a richer description of a speech act than the sequence of HMM-states, which is the prevalent ASR architecture today. In this work, we present a multi-stream architecture, in which CD-HMMS are supported by detectors for articulatory features, using a linear combination of log-likelihood scores. This multi-stream approach results in a 15% reduction of WER on a read Broadcast-News (BN) task and improves performance on a spontaneous scheduling task (ESST) by 7%. The proposed architecture potentially allows for new speaker and channel adaptation schemes, including stream asynchronicity.
Bibliographic reference. Metze, Florian / Waibel, Alex (2002): "A flexible stream architecture for ASR using articulatory features", In ICSLP-2002, 2133-2136.