12th Annual Conference of the International Speech Communication Association

Florence, Italy
August 27-31. 2011

Super-Dirichlet Mixture Models Using Differential Line Spectral Frequencies for Text-Independent Speaker Identification

Zhanyu Ma, Arne Leijon

KTH, Sweden

A new text-independent speaker identification (SI) system is proposed. This system utilizes the line spectral frequencies (LSFs) as alternative feature set for capturing the speaker characteristics. The boundary and ordering properties of the LSFs are considered and the LSF are transformed to the differential LSF (DLSF) space. Since the dynamic information is useful for speaker recognition, we represent the dynamic information of the DLSFs by considering two neighbors of the current frame, one from the past frames and the other from the following frames. The current frame with the neighbor frames together are cascaded into a supervector. The statistical distribution of this supervector is modelled by the so-called super-Dirichlet mixture model, which is an extension from the Dirichlet mixture model. Compared to the conventional SI system, which is using the mel-frequency cepstral coefficients and based on the Gaussian mixture model, the proposed SI system shows a promising improvement.

Full Paper

Bibliographic reference.  Ma, Zhanyu / Leijon, Arne (2011): "Super-dirichlet mixture models using differential line spectral frequencies for text-independent speaker identification", In INTERSPEECH-2011, 2349-2352.