Heterogeneous knowledge sources that model speech only at certain time frames are difficult to incorporate into speech recognition, given standard multimodal fusion techniques. In this work, we present a new framework for the integration of this sporadic knowledge into standard HMM-based ASR. In a first step, each knowledge source is mapped onto a logarithmic score by using a sigmoid transfer function. Theses scores are then combined with the standard acoustic models by weighted linear combination. Speech recognition experiments with broad phonetic knowledge sources on a broadcast news transcription task show improved recognition results, given knowledge that provides complementary information for the ASR system.
Index Terms: multimodal fusion, landmark-driven ASR, eventbased speech recognition
Cite as: Ziegler, S., Gravier, G. (2013) A framework for integrating heterogeneous sporadic knowledge sources into automatic speech recognition. Proc. First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013), 37-42
@inproceedings{ziegler13_slam, author={Stefan Ziegler and Guillaume Gravier}, title={{A framework for integrating heterogeneous sporadic knowledge sources into automatic speech recognition}}, year=2013, booktitle={Proc. First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013)}, pages={37--42} }