Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Subword-Dependent Speaker Clustering for Improved Speech Recognition

Li Jiang, Xuedong Huang

Speech Technology Group, Microsoft Research, Redmond, WA, USA

Speaker variability has a significant impact to the state-of-the-art speech recognition systems. Traditionally speaker clustering is performed without considering individual or class phonetic similarities across different speakers. In fact, clustered speaker groups may have very different degrees of variations for different phonetic classes. In this paper, speaker clustering is performed at subword level or subphonetic level. With one or more instances derived from clustering for each subword or subphonetic unit, we model speaker variation explicitly across different subword or subphonetic instances. In addition, we select from massive possible combinations of speaker-clustered subword models to form our initial model for speaker adaptation. Experiments show that subword-dependent speaker clustering is more effective than the traditional speaker clustering.

Full Paper

Bibliographic reference.  Jiang, Li / Huang, Xuedong (2000): "Subword-dependent speaker clustering for improved speech recognition", In ICSLP-2000, vol.4, 137-140.