In this paper, we present a new approach for speaker recognition, which uses the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data are modeled with sinusoids based on pitch, vibration amplitude and phase bias. We use the resynthesized speech data to extract cepstral features for speaker modeling and scoring in the same way as in traditional speaker recognition approaches. We then model these features using GMMs and compensate for speaker and channel variability effects using joint factor analysis. The experiments are carried out on the core condition of NIST 2008 speaker recognition evaluation data. The experimental results show that our proposed system achieves comparable performance to the state-of-the-art cepstral-based joint factor analysis system which uses the original data for speaker recognition.
Bibliographic reference. Zhang, Xiang / Cao, Chuan / Yang, Lin / Suo, Hongbin / Zhang, Jianping / Yan, Yonghong (2010): "Speaker recognition using the resynthesized speech via spectrum modeling", In INTERSPEECH-2010, 2142-2145.