EUROSPEECH 2003 - INTERSPEECH 2003
We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database. Several techniques are introduced to improve discriminant ability. ASR is performed using the results of this indexing. No discussion corpus is available to train acoustic and language models. So we applied the speaker adaptation technique to the baseline acoustic model based on the indexing. We also constructed a language model by merging two models that cover different linguistic features. We achieved the speaker indexing accuracy of 93% and the significant improvement of ASR for real discussion data.
Bibliographic reference. Akita, Yuya / Kawahara, Tatsuya (2003): "Unsupervised speaker indexing using anchor models and automatic transcription of discussions", In EUROSPEECH-2003, 2985-2988.