INTERSPEECH 2004 - ICSLP
8th International Conference on Spoken Language Processing

Jeju Island, Korea
October 4-8, 2004

Unsupervised Speaker Adaptation using High Confidence Portion Recognition Results by Multiple Recognition Systems

Tomohiro Watanabe (1), Hiromitsu Nishizaki (2), Takehito Utsuro (3), Seiichi Nakagawa (1)

(1) Toyohashi University of Technology, Japan
(2) University of Yamanashi, Japan
(3) Kyoto University, Japan

This paper describes an accurate unsupervised speaker adaptation method for lecture speech recognition using multiple LVCSRs. In an unsupervised speaker adaptation framework, the improvement of recognition performance by adapting acoustic models greatly depends on the accuracy of labels such as phonemes and syllables. Therefore, extraction of the adaptation data guided by the confidence measures is effective for unsupervised adaptation. In this paper, we looked for the high confidence portions based on the agreement between two LVCSRs, adapted acoustic models using the portions attached with high accurate labels, and then improved the recognition accuracy. We applied our method to the Corpus of Spontaneous Japanese (CSJ) and improved the recognition rate by about 5% in comparison with a traditional method.

Full Paper

Bibliographic reference.  Watanabe, Tomohiro / Nishizaki, Hiromitsu / Utsuro, Takehito / Nakagawa, Seiichi (2004): "Unsupervised speaker adaptation using high confidence portion recognition results by multiple recognition systems", In INTERSPEECH-2004, 1989-1992.