EUROSPEECH 2003 - INTERSPEECH 2003
This paper explores a new approach to speech recognition in which sub-word units are modeled in terms of linguistic features. Specifically, we have adopted a scheme of modeling separately the manner and place of articulation for these units. A novelty of our work is the use of a generalized definition of place of articulation that enables us to map both vowels and consonants into a common linguistic space. Modeling manner and place separately also allows us to explore a multi-stage recognition architecture, in which the search space is successively reduced as more detailed models are brought in. In the 8,000 word PhoneBook isolated word telephone speech recognition task, we show that such an approach can achieve a recognition WER that is 10% better than that achieved in the best results reported in the literature. This performance gain comes with improvements in search space and computation time as well.
Bibliographic reference. Tang, Min / Seneff, Stephanie / Zue, Victor W. (2003): "Modeling linguistic features in speech recognition", In EUROSPEECH-2003, 2585-2588.