8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Frequency Domain Correspondence for Speaker Normalization

Ming Liu (1), Xi Zhou (1), Mark Hasegawa-Johnson (1), Thomas S. Huang (1), Zhengyou Zhang (2)

(1) University of Illinois at Urbana-Champaign, USA
(2) Microsoft Research, USA

Due to physiology and linguistic difference between speakers, the spectrum pattern for the same phoneme of two speakers can be quite dissimilar. Without appropriate alignment on the frequency axis, the inter-speaker variation will reduce the modeling efficiency and result in performance degradation. In this paper, a novel data-driven framework is proposed to build the alignment of the frequency axes of two speakers. This alignment between two frequency axes is essentially a frequency domain correspondence of these two speakers. To establish the frequency domain correspondence, we formulate the task as an optimal matching problem. The local matching is achieved by comparing the local features of the spectrogram along the frequency bins. This local matching is actually capturing the similarity of the local patterns along different frequency bins in the spectrogram. After the local matching, a dynamic programming is then applied to find the global optimal alignment between two frequency axes. Experiments on TIDIGITS and TIMIT clearly show the effectiveness of this method.

Full Paper

Bibliographic reference.  Liu, Ming / Zhou, Xi / Hasegawa-Johnson, Mark / Huang, Thomas S. / Zhang, Zhengyou (2007): "Frequency domain correspondence for speaker normalization", In INTERSPEECH-2007, 274-277.