Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Transformation Enhanced Multi-Grained Modeling for Text-Independent Speaker Recognition

Upendra V. Chaudhari, Jiri Navrátil, Stéphane H. Maes, Ramesh Gopinath

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

We describe our formulation of transformation enhanced data modeling used to develop a multi-grained data analysis approach to text independent speaker recognition. The broad goal is to address difficulties caused by sparse training and test data. First, our development of maximum likelihood transformation based recognition with diagonally constrained Gaussian mixture models is detailed. We give results to show its robustness to decreasing training data. Then using the these models as building blocks, a multigrained model structure is developed. For this, the training data must be labeled, e.g. with an HMM based phone labeler. A graduated phone class structure is then used to train the speaker model at various levels of detail. This structure is a tree with the root node containing all the phones. Subsequent levels partition the phones into increasingly finer grained linguistic classes. We demonstrate the effectiveness of the modeling with identification and verification experiments.

Full Paper

Bibliographic reference.  Chaudhari, Upendra V. / Navrátil, Jiri / Maes, Stéphane H. / Gopinath, Ramesh (2000): "Transformation enhanced multi-grained modeling for text-independent speaker recognition", In ICSLP-2000, vol.2, 298-301.