EUROSPEECH 2003 - INTERSPEECH 2003
As a part of our effort to develop a unified computational framework for speech-to-speech translation, so that sub-optimizations or local optimizations can be avoided, we are developing direct models for speech recognition. In direct model, the focus is on the creation of one single integrated model p(text| acoustics), rather than a complex series of artifices, therefore various factors such as linguistics and language features, speaker or speaking rate differences, different acoustic conditions, can be applied to the joint optimization. In this paper we discuss how linguistic and semantic constraints are used in phoneme recognition.
Bibliographic reference. Gao, Yuqing (2003): "Coupling vs. unifying: modeling techniques for speech-to-speech translation", In EUROSPEECH-2003, 365-368.