8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Unsupervised Speaker Indexing Using Anchor Models and Automatic Transcription of Discussions

Yuya Akita, Tatsuya Kawahara

Kyoto University, Japan

We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database. Several techniques are introduced to improve discriminant ability. ASR is performed using the results of this indexing. No discussion corpus is available to train acoustic and language models. So we applied the speaker adaptation technique to the baseline acoustic model based on the indexing. We also constructed a language model by merging two models that cover different linguistic features. We achieved the speaker indexing accuracy of 93% and the significant improvement of ASR for real discussion data.

Full Paper

Bibliographic reference.  Akita, Yuya / Kawahara, Tatsuya (2003): "Unsupervised speaker indexing using anchor models and automatic transcription of discussions", In EUROSPEECH-2003, 2985-2988.