Sixth European Conference on Speech Communication and Technology

Budapest, Hungary
September 5-9, 1999

Modeling and Efficient Decoding of Large Vocabulary Conversational Speech

Michael Finke, Jürgen Fritsch, Detlef Koll, Alex Waibel

Interactive Systems Inc. Pittsburgh, PA, USA

Capturing the large variability of conversational speechin the framework of purely phone based speech recog-nizers is virtually impossible. It has been shown earlier that suprasegmental features such asspeaking rate,duration and syllabic, syntactic and semantic structureare important predictors of pronunciation variation. Inorder to allow for a tighter coupling of these predictorsof pronunciation, duration and acoustic modeling a newrecognition toolkit has been developed. The phonetictranscription of speech has been generalized to an attribute based representation, thus enabling the integra-tion of suprasegmental, non-phonetic features. A pronunciation model is trained to augment the attribute tran-scription to mark possible pronunciation effects which arethen taken into account by the acoustic model induction algorithm. A finite state machine single-prefix-tree,one-pass, time-synchronous decoder is presented that efficiently decodes highly spontaneous speech within thisnew representational framework.

Full Paper (PDF)   Gnu-Zipped Postscript

Bibliographic reference.  Finke, Michael / Fritsch, Jürgen / Koll, Detlef / Waibel, Alex (1999): "Modeling and efficient decoding of large vocabulary conversational speech", In EUROSPEECH'99, 467-470.