ISCA Archive Interspeech 2006
ISCA Archive Interspeech 2006

Rapid speaker adaptation using regression-tree based spectral peak alignment

Shizhen Wang, Xiaodong Cui, Abeer Alwan

In this paper, regression-tree based spectral peak alignment is proposed for rapid speaker adaptation using the linearization of VTLN. Two different regression classes are investigated: phonetic classes (using combined knowledge and data-driven techniques) and mixture classes. Compared to MLLR and VTLN, improved performance can be obtained for both supervised and unsupervised adaptations on both medium vocabulary and connected digits recognition tasks. To further improve the performance, MLLR was integrated into this regression-tree based peak alignment. Experimental results show that the performance improvements can be achieved even with limited adaptation data.


doi: 10.21437/Interspeech.2006-423

Cite as: Wang, S., Cui, X., Alwan, A. (2006) Rapid speaker adaptation using regression-tree based spectral peak alignment. Proc. Interspeech 2006, paper 1334-Wed1A2O.1, doi: 10.21437/Interspeech.2006-423

@inproceedings{wang06e_interspeech,
  author={Shizhen Wang and Xiaodong Cui and Abeer Alwan},
  title={{Rapid speaker adaptation using regression-tree based spectral peak alignment}},
  year=2006,
  booktitle={Proc. Interspeech 2006},
  pages={paper 1334-Wed1A2O.1},
  doi={10.21437/Interspeech.2006-423}
}