ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition
April 13-16, 2003
We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database, and we incorporate several techniques to improve discriminant ability. ASR is performed using the results of this indexing. No discussion corpus is available to train acoustic and language models. So we applied the speaker adaptation technique to the baseline acoustic model based on the indexing. We also constructed a language model by merging two models that cover different linguistic features. We achieved the speaker indexing accuracy of 93% and the word recognition accuracy of 57% for real discussion data.
Bibliographic reference. Akita, Yuya / Nishida, Masafumi / Kawahara, Tatsuya (2003): "Automatic transcription of discussions using unsupervised speaker indexing", in SSPR-2003, paper MAP11.