ISCA Archive Interspeech 2008
ISCA Archive Interspeech 2008

Region-based vocal tract length normalization for ASR

Michail G. Maragakis, Alexandros Potamianos

In this paper, we propose a Region-based multi-parametric Vocal Tract Length Normalization (R-VTLN) algorithm for the problem of automatic speech recognition (ASR). The proposed algorithm extends the well-established mono-parametric utterance-based VTLN algorithm of Lee and Rose [1] by dividing the speech frames of a test utterance into regions and by warping independently the features corresponding to each region using a maximum likelihood criterion. We propose two algorithms for classifying frames into regions: an unsupervised clustering algorithm and an unsupervised algorithm assigning frames to regions based on phonetic-class labels obtained from a first recognition pass. We also investigate the ability of various mono-parametric and multi-parametric warping functions to reduce the spectral distance between two speakers, as a function of phone. R-VTLN is shown to significantly outperform mono-parametric VTLN in terms of word accuracy for the AURORA4 database.


doi: 10.21437/Interspeech.2008-397

Cite as: Maragakis, M.G., Potamianos, A. (2008) Region-based vocal tract length normalization for ASR. Proc. Interspeech 2008, 1365-1368, doi: 10.21437/Interspeech.2008-397

@inproceedings{maragakis08_interspeech,
  author={Michail G. Maragakis and Alexandros Potamianos},
  title={{Region-based vocal tract length normalization for ASR}},
  year=2008,
  booktitle={Proc. Interspeech 2008},
  pages={1365--1368},
  doi={10.21437/Interspeech.2008-397}
}