Sixth European Conference on Speech Communication and Technology
Capturing the large variability of conversational speechin the framework of purely phone based speech recog-nizers is virtually impossible. It has been shown earlier that suprasegmental features such asspeaking rate,duration and syllabic, syntactic and semantic structureare important predictors of pronunciation variation. Inorder to allow for a tighter coupling of these predictorsof pronunciation, duration and acoustic modeling a newrecognition toolkit has been developed. The phonetictranscription of speech has been generalized to an attribute based representation, thus enabling the integra-tion of suprasegmental, non-phonetic features. A pronunciation model is trained to augment the attribute tran-scription to mark possible pronunciation effects which arethen taken into account by the acoustic model induction algorithm. A finite state machine single-prefix-tree,one-pass, time-synchronous decoder is presented that efficiently decodes highly spontaneous speech within thisnew representational framework.
Full Paper (PDF) Gnu-Zipped Postscript
Bibliographic reference. Finke, Michael / Fritsch, Jürgen / Koll, Detlef / Waibel, Alex (1999): "Modeling and efficient decoding of large vocabulary conversational speech", In EUROSPEECH'99, 467-470.