EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Modeling Linguistic Features in Speech Recognition

Min Tang, Stephanie Seneff, Victor W. Zue

Massachusetts Institute of Technology, USA

This paper explores a new approach to speech recognition in which sub-word units are modeled in terms of linguistic features. Specifically, we have adopted a scheme of modeling separately the manner and place of articulation for these units. A novelty of our work is the use of a generalized definition of place of articulation that enables us to map both vowels and consonants into a common linguistic space. Modeling manner and place separately also allows us to explore a multi-stage recognition architecture, in which the search space is successively reduced as more detailed models are brought in. In the 8,000 word PhoneBook isolated word telephone speech recognition task, we show that such an approach can achieve a recognition WER that is 10% better than that achieved in the best results reported in the literature. This performance gain comes with improvements in search space and computation time as well.

Full Paper

Bibliographic reference.  Tang, Min / Seneff, Stephanie / Zue, Victor W. (2003): "Modeling linguistic features in speech recognition", In EUROSPEECH-2003, 2585-2588.