Sixth International Conference on Spoken Language Processing

This paper introduces a method that can efficiently reduce acoustic model size and computation for LVCSR based on continuousdensity hidden Mokov model (CDHMM). The method uses Bhattacharyya distance measure as a criterion to quantize the mean and variance vectors of Gaussian mixture. To minimize the quantization error, the feature vector was separated into multiple streams (such as MFCCs, deltaMFCCs and deltadelta MFCCs) and then the modified Kmeans clustering algorithm was applied to each stream. The key ideas of our modified Kmeans clustering algorithm are based on the strategy which dynamically splits and merges cluster according to its size and average distortion during each iteration for each cluster. The proposed approach can cut acoustic model size by 87% from 21.42MB to 2.75MB from a CDHMM baseline system (with 12 mixtures , 6k states) by using 256 and 8192 codewords for each stream of mean and variance vectors of Gaussian mixtures. The recognition experiment on Chinese LVCSR dictation system (of 51K words ) shows that using the 87% smaller compact model, the WER increased by 5% to 10.3% from 9.8% for the CDHMM baseline system. After quantization, the Gaussian likelihood can be precomputed only once at the beginning of every frame and their values can be stored in a lookup table, so the computation during decoding is greatly reduced as well.
Bibliographic reference. Pan, Jielin / Yuan, Baosheng / Yan, Yonghong (2000): "Effective vector quantization for a highly compact acoustic model for LVCSR", In ICSLP2000, vol.4, 318321.