13th Annual Conference of the International Speech Communication Association

Portland, OR, USA
September 9-13, 2012

Enhancing Exemplar-Based Posteriors for Speech Recognition Tasks

Tara N. Sainath, David Nahamoo, Dimitri Kanevsky, Bhuvana Ramabhadran

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

Posteriors generated from exemplar-based sparse representation methods are often learned to minimize reconstruction error of the feature vectors. These posteriors are not learned through a discriminative process linked to the word error rate (WER) objective of a speech recognition task. In this paper, we explore modeling exemplar-based posteriors to address this issue. We first explore posterior modeling by training a Neural Network using exemplar-based posteriors as inputs. This produces a new set of posteriors which have been learned to minimize a cross-entropy measure, and indirectly frame error rate. Second, we take the new NN posteriors and apply a tied mixture smoothing technique to these posteriors, making them more suited for a speech recognition task. On the TIMIT task, we show that using a NN model, we can improve the performance of our sparse representations by 1.3% absolute, achieving a PER of 19.0% by modeling SR posteriors with a NN. Furthermore, taking these NN posteriors and applying further smoothing techniques, we improve the PER to 18.7%, one of the best results reported in the literature on TIMIT.

Full Paper

Bibliographic reference.  Sainath, Tara N. / Nahamoo, David / Kanevsky, Dimitri / Ramabhadran, Bhuvana (2012): "Enhancing exemplar-based posteriors for speech recognition tasks", In INTERSPEECH-2012, 2130-2133.