ISCA & IEEE Workshop on Spontaneous Speech Processing and Recognition

April 13-16, 2003
Tokyo Institute of Technology, Tokyo, Japan

Automatic Transcription of Discussions Using Unsupervised Speaker Indexing

Yuya Akita (1,2), Masafumi Nishida (2), Tatsuya Kawahara (1,2)

(1) School of Informatics, Kyoto University, Japan
(2) Japan Science and Technology Corporation, PRESTO

We present unsupervised speaker indexing combined with automatic speech recognition (ASR) for speech archives such as discussions. Our proposed indexing method is based on anchor models, by which we define a feature vector based on the similarity with speakers of a large scale speech database, and we incorporate several techniques to improve discriminant ability. ASR is performed using the results of this indexing. No discussion corpus is available to train acoustic and language models. So we applied the speaker adaptation technique to the baseline acoustic model based on the indexing. We also constructed a language model by merging two models that cover different linguistic features. We achieved the speaker indexing accuracy of 93% and the word recognition accuracy of 57% for real discussion data.

Full Paper

Bibliographic reference.  Akita, Yuya / Nishida, Masafumi / Kawahara, Tatsuya (2003): "Automatic transcription of discussions using unsupervised speaker indexing", in SSPR-2003, paper MAP11.