14thAnnual Conference of the International Speech Communication Association

Lyon, France
August 25-29, 2013

Synthetic Speaker Models Using VTLN to Improve the Performance of Children in Mismatched Speaker Conditions for ASR

D. R. Sanand, T. Svendsen

NTNU, Norway

The paper proposes to train synthetic speaker models using vocal tract length normalization (VTLN). Speaker adaptation based approaches require certain amount of data from the test speaker to either update or transform the model parameters of the trained model. If there is very little or no data available from the test speaker, we propose to create a synthetic speaker model that is acoustically close to the test speaker by scaling the training data with VTLN. For this purpose, we train multiple VTLN warped speaker independent (SI) models by scaling the training data with VTLN and choosing one of the models that is acoustically close to the test speaker for performing recognition. We show that the proposed approach is advantageous in mismatched speaker conditions, especially while recognizing children speakers using models trained on adult speech.

Full Paper

Bibliographic reference.  Sanand, D. R. / Svendsen, T. (2013): "Synthetic speaker models using VTLN to improve the performance of children in mismatched speaker conditions for ASR", In INTERSPEECH-2013, 3361-3365.