Capturing the large variability of conversational speechin the framework of purely phone based speech recog-nizers is virtually impossible. It has been shown earlier that suprasegmental features such asspeaking rate,duration and syllabic, syntactic and semantic structureare important predictors of pronunciation variation. Inorder to allow for a tighter coupling of these predictorsof pronunciation, duration and acoustic modeling a newrecognition toolkit has been developed. The phonetictranscription of speech has been generalized to an attribute based representation, thus enabling the integra-tion of suprasegmental, non-phonetic features. A pronunciation model is trained to augment the attribute tran-scription to mark possible pronunciation effects which arethen taken into account by the acoustic model induction algorithm. A finite state machine single-prefix-tree,one-pass, time-synchronous decoder is presented that efficiently decodes highly spontaneous speech within thisnew representational framework.
Cite as: Finke, M., Fritsch, J., Koll, D., Waibel, A. (1999) Modeling and efficient decoding of large vocabulary conversational speech. Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999), 467-470, doi: 10.21437/Eurospeech.1999-120
@inproceedings{finke99_eurospeech, author={Michael Finke and Jürgen Fritsch and Detlef Koll and Alex Waibel}, title={{Modeling and efficient decoding of large vocabulary conversational speech}}, year=1999, booktitle={Proc. 6th European Conference on Speech Communication and Technology (Eurospeech 1999)}, pages={467--470}, doi={10.21437/Eurospeech.1999-120} }