15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

Posterior-Based Sparse Representation for Automatic Speech Recognition

Sara Bahaadini, Afsaneh Asaei, David Imseng, Hervé Bourlard

Idiap Research Institute, Switzerland

Posterior features have been shown to yield very good performance in multiple contexts including speech recognition, spoken term detection, and template matching. These days, posterior features are usually estimated at the output of a neural network. More recently, sparse representation has also been shown to potentially provide additional advantages to improve discrimination and robustness. One possible instance of this, is referred to as exemplar-based sparse representation.
   The present work investigates how to exploit sparse modelling together with posterior space properties to further improve speech recognition features. In that context, we leverage exemplar-based sparse representation, and propose a novel approach to project phone posterior features into a new, high-dimensional, sparse feature space. In fact, exploiting the properties of posterior spaces, we generate, new, high-dimensional, linguistically inspired (sub-phone and words), posterior distributions. Validation experiments are performed on the Phonebook (isolated words) and HIWIRE (continuous speech) databases, which support the effectiveness of the proposed approach for speech recognition tasks.

Full Paper

Bibliographic reference.  Bahaadini, Sara / Asaei, Afsaneh / Imseng, David / Bourlard, Hervé (2014): "Posterior-based sparse representation for automatic speech recognition", In INTERSPEECH-2014, 2454-2458.