In this paper, an MLLR-like adaptation approach is proposed whereby the transformation of the means is performed deterministically based on linearization of VTLN. Biases and adaptation of the variances are estimated statistically by the EM algorithm. In the discrete frequency domain, we show that under certain approximations, frequency warping with Mel-filterbank-based MFCCs equals a linear transformation in the cepstral domain. Utilizing the deduced linear relationship, the transformation matrix is generated by formant-like peak alignment. Experimental results using children's speech show improvements over traditional MLLR and VTLN. The improvements occur even with limited amounts of adaptation data.
Cite as: Cui, X., Alwan, A. (2005) MLLR-like speaker adaptation based on linearization of VTLN with MFCC features. Proc. Interspeech 2005, 273-276, doi: 10.21437/Interspeech.2005-156
@inproceedings{cui05_interspeech, author={Xiaodong Cui and Abeer Alwan}, title={{MLLR-like speaker adaptation based on linearization of VTLN with MFCC features}}, year=2005, booktitle={Proc. Interspeech 2005}, pages={273--276}, doi={10.21437/Interspeech.2005-156} }