Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a speaker's speech is rescaled or warped prior to the extraction of cepstral features. In this work, we develop a novel speaker normalization scheme by exploiting the fact that frequency domain transformations similar to that inherent in VTLN can be accomplished entirely in the cepstral domain through the use of conformal maps. We propose a class of such maps, designated all-pass transforms for reasons given hereafter, and in a set of speech recognition experiments conducted on the Switchboard Corpus demonstrate their capacity to achieve word error rate reductions of 3.7% absolute.
Cite as: McDonough, J., Byrne, W., Luo, X. (1998) Speaker normalization with all-pass transforms. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0869, doi: 10.21437/ICSLP.1998-747
@inproceedings{mcdonough98_icslp, author={John McDonough and William Byrne and Xiaoqiang Luo}, title={{Speaker normalization with all-pass transforms}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0869}, doi={10.21437/ICSLP.1998-747} }