5th International Conference on Spoken Language Processing

Sydney, Australia
November 30 - December 4, 1998

Speaker Normalization with All-Pass Transforms

John McDonough, William Byrne, Xiaoqiang Luo

Center for Language and Speech Processing, The Johns Hopkins University, USA

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a speaker's speech is rescaled or warped prior to the extraction of cepstral features. In this work, we develop a novel speaker normalization scheme by exploiting the fact that frequency domain transformations similar to that inherent in VTLN can be accomplished entirely in the cepstral domain through the use of conformal maps. We propose a class of such maps, designated all-pass transforms for reasons given hereafter, and in a set of speech recognition experiments conducted on the Switchboard Corpus demonstrate their capacity to achieve word error rate reductions of 3.7% absolute.

Full Paper

Bibliographic reference.  McDonough, John / Byrne, William / Luo, Xiaoqiang (1998): "Speaker normalization with all-pass transforms", In ICSLP-1998, paper 0869.