ISCA Archive SLAM 2013
ISCA Archive SLAM 2013

A framework for integrating heterogeneous sporadic knowledge sources into automatic speech recognition

Stefan Ziegler, Guillaume Gravier

Heterogeneous knowledge sources that model speech only at certain time frames are difficult to incorporate into speech recognition, given standard multimodal fusion techniques. In this work, we present a new framework for the integration of this sporadic knowledge into standard HMM-based ASR. In a first step, each knowledge source is mapped onto a logarithmic score by using a sigmoid transfer function. Theses scores are then combined with the standard acoustic models by weighted linear combination. Speech recognition experiments with broad phonetic knowledge sources on a broadcast news transcription task show improved recognition results, given knowledge that provides complementary information for the ASR system.

Index Terms: multimodal fusion, landmark-driven ASR, eventbased speech recognition


Cite as: Ziegler, S., Gravier, G. (2013) A framework for integrating heterogeneous sporadic knowledge sources into automatic speech recognition. Proc. First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013), 37-42

@inproceedings{ziegler13_slam,
  author={Stefan Ziegler and Guillaume Gravier},
  title={{A framework for integrating heterogeneous sporadic knowledge sources into automatic speech recognition}},
  year=2013,
  booktitle={Proc. First Workshop on Speech, Language and Audio in Multimedia (SLAM 2013)},
  pages={37--42}
}