Interspeech'2005 - Eurospeech
Speaker normalization by (piecewise) linear warping of the frequency axis is a popular method because of its simplicity and effectiveness. However, when this so-called vocal tract length normalization is applied to map test speakers with a shorter vocal tract onto acoustic models trained on speakers with a longer vocal tract, there is important information missing in the frequency bins at the high end of the spectrum. Usually, this missing information is reconstructed by ad hoc rules or through extrapolation of the spectrum. In this paper, we present a new method to estimate the content of those bins. The proposed solution is derived from Missing Data Techniques, that are used for noise robust speech recognizers. To alleviate the accuracy loss associated with Missing Data Techniques that are usually expressed in the spectral domain, we apply the PROSPECT feature representation introduced about a year ago. We demonstrate the superiority of our approach on the TIDigits database.
Bibliographic reference. Jansen, Wim / Hamme, Hugo Van (2005): "PROSPECT features and their application to missing data techniques for vocal tract length normalization", In INTERSPEECH-2005, 2753-2756.