EUROSPEECH 2003 - INTERSPEECH 2003
8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003

        

Speaker Adaptation Using Regression Classes Generated by Phonetic Decision Tree-Based Successive State Splitting

Se-Jin Oh (1), Kwang-Dong Kim (1), Duk-Gyoo Roh (1), Woo-Chang Sung (2), Hyun-Yeol Chung (2)

(1) Korea Astronomy Observatory, Korea
(2) Yeungnam University, Korea

In this paper, we propose a new generation of regression classes for MLLR speaker adaptation method using the PDTSSS algorithm so as to represent the characteristic of speaker effectively. This method extends the state splitting through clustering the context components of adaptation data into a tree structure. It enables to autonomously control a number of adaptation parameters (mean, variance) depending on the context information and the amount of adaptation utterances from a new speaker. Through the experiments, the phone and word recognition rates with adaptation have an average 34~37%, 9% higher accuracy than the speaker-independent acoustic models, respectively. The experimental results of Korean phone and word recognition confirmed the significant performance increase in small adaptation utterances compared with without any speaker adaptation.

Full Paper

Bibliographic reference.  Oh, Se-Jin / Kim, Kwang-Dong / Roh, Duk-Gyoo / Sung, Woo-Chang / Chung, Hyun-Yeol (2003): "Speaker adaptation using regression classes generated by phonetic decision tree-based successive state splitting", In EUROSPEECH-2003, 1457-1460.