Sixth International Conference on Spoken Language Processing
Speaker variability has a significant impact to the state-of-the-art speech recognition systems. Traditionally speaker clustering is performed without considering individual or class phonetic similarities across different speakers. In fact, clustered speaker groups may have very different degrees of variations for different phonetic classes. In this paper, speaker clustering is performed at subword level or subphonetic level. With one or more instances derived from clustering for each subword or subphonetic unit, we model speaker variation explicitly across different subword or subphonetic instances. In addition, we select from massive possible combinations of speaker-clustered subword models to form our initial model for speaker adaptation. Experiments show that subword-dependent speaker clustering is more effective than the traditional speaker clustering.
Bibliographic reference. Jiang, Li / Huang, Xuedong (2000): "Subword-dependent speaker clustering for improved speech recognition", In ICSLP-2000, vol.4, 137-140.