12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Reducing Computational Complexities of Exemplar-Based Sparse Representations with Applications to Large Vocabulary Speech Recognition

Tara N. Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky

IBM T.J. Watson Research Center, USA

Recently, exemplar-based sparse representation phone identification features (Spif ) have shown promising results on large vocabulary speech recognition tasks. However, one problem with exemplar-based techniques is that they are computationally expensive. In this paper, we present two methods to speed up the creation of Spif features. First, we explore a technique to quickly select a subset of informative exemplars among millions of training examples. Secondly, we make approximations to the sparse representation computation such that a matrix-matrix multiplication is reduced to a matrix-vector product. We present results on four large vocabulary tasks, including Broadcast News where acoustic models are trained with 50 and 400 hours, and a Voice Search task, where models are trained with 160 and 1000 hours. Results on all tasks indicate improvements in speedup by a factor of four relative to the original Spif features, as well as improvements in word error rate (WER) in combination with a baseline HMM system.

Full Paper

Bibliographic reference.  Sainath, Tara N. / Ramabhadran, Bhuvana / Nahamoo, David / Kanevsky, Dimitri (2011): "Reducing computational complexities of exemplar-based sparse representations with applications to large vocabulary speech recognition", In INTERSPEECH-2011, 785-788.