12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

A Paradigm for Limited Vocabulary Speech Recognition Based on Redundant Spectro-Temporal Feature Sets

Sourish Chaudhuri (1), Bhiksha Raj (1), Tony Ezzat (2)

(1) Carnegie Mellon University, USA
(2) MIT, USA

Speech recognition techniques have come to rely almost completely on HMM based frameworks. In this paper, we present a novel paradigm for small-vocabulary speech recognition based on a recently proposed word spotting technique. Recent work using discriminative classifiers with ordered spectro-temporal features to detect the presence of keywords obtained encouraging improvements over HMM-based models. We propose to extend this approach to recognize continuous speech in our work. Our method uses discriminative models to predict which words are present in a speech signal and hypothesize their locations. A graph search using dynamic programming is then used to obtain the most likely sequence of words from the hypothesis set produced as a result of combining the results from the discriminative word classifiers. While this approach doesn't perform as well as state-of-the-art ASR systems, it can be particularly useful for languages with small amounts of annotated data available.

Full Paper

Bibliographic reference.  Chaudhuri, Sourish / Raj, Bhiksha / Ezzat, Tony (2011): "A paradigm for limited vocabulary speech recognition based on redundant spectro-temporal feature sets", In INTERSPEECH-2011, 3169-3172.