In this paper we present a technique to increase the robustness of a self-learning speech controlled system comprising speech recognition, speaker identification and speaker adaptation. Our goal is the automatic personalization of a speech controlled device for groups of 5-10 recurring speakers. Speakers should be identified and tracked across speaker turns only by their voice patterns. Efficient information retrieval and the statistical representation of speaker characteristics have to be combined with a reliable and flexible speaker identification. Even on limited adaptation data, e.g. 2-3 command and control utterances, speakers have to be reliably tracked to allow continuous adaptation of complex statistical models. We present a novel approach of speaker identification on different time-scales based on a unified speech and speaker model. Experiments were carried out on a subset of the SPEECON database.
Bibliographic reference. Herbig, Tobias / Gerl, Franz / Minker, Wolfgang (2010): "Speaker tracking in an unsupervised speech controlled system", In INTERSPEECH-2010, 2666-2669.