Sixth International Conference on Spoken Language Processing
(ICSLP 2000)

Beijing, China
October 16-20, 2000

Comparative Study of GMM, DTW, and ANN on Thai Speaker Identification System

Chularat Tanprasert, Varin Achariyakulporn

Information Research and Development Division, National Electronics and Computer Technology Center, National Science and Technology Development Agency, Ministry of Science, Technology, and Environment, Rachathewi, Bangkok, Thailand

This paper proposes a new investigation on Gaussian mixture model (GMM) by comparing it with some preliminary experiments on multilayered perceptron network (MLP) with backpropagation learning algorithm (BKP) and dynamic time warping (DTW) techniques on Thai text-dependent speaker identification system. Three major identification engines are conducted on 50 speakers with isolated digits 0-9. Training and testing utterances were recorded over a five week duration. Furthermore, three well-known speech features, namely linear predictive coding derived cepstrum (LPCC), postfiltered ceptrum (PFL), and Mel frequency cepstral coefficient (MFCC) were evaluated. From our previous experiments, the MFCC has given the highest identification rates on DTW and MLP. Therefore, GMM with MFCC feature was experimented and attained 87.54% average identification accuracy, as opposed to 86.74% of DTW and 82.34% of MLP. The results are the same with top-3 concatenated digits, the average identification rates are 99%, 98.70 %, and 97.30% for GMM, DTW, and MLP, respectively.


Full Paper

Bibliographic reference.  Tanprasert, Chularat / Achariyakulporn, Varin (2000): "Comparative study of GMM, DTW, and ANN on Thai speaker identification system", In ICSLP-2000, vol.2, 234-237.