Symposium on Machine Learning in Speech and Language Processing (MLSLP)
Portland, Oregon, USA
In this paper, we compare two different frameworks for exemplarbased speech recognition and propose a combined system that approximates the input speech as a linear combination of exemplars of variable length. This approach allows us not only to use multiple length long exemplars, each representing a certain speech unit, but also to jointly approximate input speech segments using several exemplars. While such an approach is able to model noisy speech, it also enforces a feature representation in which additivity of the effect of signal sources holds. This is observed to limit the recognition accuracy compared to e.g. discriminatively trained representations. We investigate the system performance starting from a baseline single-neighbor exemplar matching system using discriminative features to the proposed combined system to identify the main reasons of recognition errors. Even though the proposed approach has a lower recognition accuracy than the baseline, it significantly outperforms the intermediate systems using comparable features.
Index Terms: speech recognition, exemplar-based, template matching, sparse representations
Bibliographic reference. Yılmaz, Emre / Compernolle, Dirk Van / Van hamme, Hugo (2012): "Combining exemplar-based matching and exemplar-based sparse representations of speech", In MLSLP-2012, 30-33.