EUROSPEECH 2003  INTERSPEECH 2003

We present an algorithm for clustering multivariate normal distributions based upon the symmetric, KullbackLeibler divergence. Optimal mean vector and covariance matrix of the centroid normal distribution are derived and a set of Riccati matrix equations is used to find the optimal covariance matrix. The solutions are found iteratively by alternating the intermediate mean and covariance solutions. Clustering performance of the new algorithm is shown to be superior to that of nonoptimal sample mean and covariance solutions. It achieves a lower overall distortion and flatter distributions of pdf samples across clusters. The resultant optimal clusters were further tested on the Wall Street Journal database for adapting HMM parameters in a Structured Maximum A Posterior Linear Regression (SMAPLR) framework. The recognition performance was significantly improved and the word error rate was reduced from 32.6% for a nonoptimal centroid (sample mean and covariance) to 27.6% and 27.5% for the diagonal and full covariance matrix cases, respectively.
Bibliographic reference. Myrvoll, Tor Andre / Soong, Frank K. (2003): "On divergence based clustering of normal distributions and its application to HMM adaptation", In EUROSPEECH2003, 15171520.