ISCA Archive Interspeech 2007
ISCA Archive Interspeech 2007

Frequency domain correspondence for speaker normalization

Ming Liu, Xi Zhou, Mark Hasegawa-Johnson, Thomas S. Huang, Zhengyou Zhang

Due to physiology and linguistic difference between speakers, the spectrum pattern for the same phoneme of two speakers can be quite dissimilar. Without appropriate alignment on the frequency axis, the inter-speaker variation will reduce the modeling efficiency and result in performance degradation. In this paper, a novel data-driven framework is proposed to build the alignment of the frequency axes of two speakers. This alignment between two frequency axes is essentially a frequency domain correspondence of these two speakers. To establish the frequency domain correspondence, we formulate the task as an optimal matching problem. The local matching is achieved by comparing the local features of the spectrogram along the frequency bins. This local matching is actually capturing the similarity of the local patterns along different frequency bins in the spectrogram. After the local matching, a dynamic programming is then applied to find the global optimal alignment between two frequency axes. Experiments on TIDIGITS and TIMIT clearly show the effectiveness of this method.

doi: 10.21437/Interspeech.2007-120

Cite as: Liu, M., Zhou, X., Hasegawa-Johnson, M., Huang, T.S., Zhang, Z. (2007) Frequency domain correspondence for speaker normalization. Proc. Interspeech 2007, 274-277, doi: 10.21437/Interspeech.2007-120

  author={Ming Liu and Xi Zhou and Mark Hasegawa-Johnson and Thomas S. Huang and Zhengyou Zhang},
  title={{Frequency domain correspondence for speaker normalization}},
  booktitle={Proc. Interspeech 2007},