Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application

Stephanie Pancoast, Murat Akbacak


In addition to the increasing number of publicly available multimedia documents generated and searched every day, there is also a large corpora of personalized videos, images and spoken recordings, stored on users’ private devices and/or in their personal accounts in the cloud. Retrieving spoken items via voice commonly involves supervised indexing approaches such as large vocabulary speech recognition. When these items are personalized recordings, diverse and personalized content causes recognition systems to experience mis-matches mostly in vocabulary and language model components, and sometimes even in the language users use. All of these contribute to retrieval task performing very poorly. Alternatively, common audio patterns can be captured and used for exampler-based retrieval in an unsupervised fashion but this approach has its limitations as well. In this work we explore supervised, unsupervised and fusion techniques to perform the retrieval of short personalized spoken utterances. On a small collection of personal recordings, we find that when fusing word, phoneme and unsupervised frame based systems, we can improve accuracy on the top retrieved item approximately 3% above the best performing individual system. Besides demonstrating this improvement on our initial collection, we hope to attract community’s interest to such novel personalized retrieval applications.


DOI: 10.21437/Interspeech.2016-1589

Cite as

Pancoast, S., Akbacak, M. (2016) Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application. Proc. Interspeech 2016, 3071-3075.

Bibtex
@inproceedings{Pancoast+2016,
author={Stephanie Pancoast and Murat Akbacak},
title={Teaming Up: Making the Most of Diverse Representations for a Novel Personalized Speech Retrieval Application},
year=2016,
booktitle={Interspeech 2016},
doi={10.21437/Interspeech.2016-1589},
url={http://dx.doi.org/10.21437/Interspeech.2016-1589},
pages={3071--3075}
}