This paper proposes a new investigation on Gaussian mixture model (GMM) by comparing it with some preliminary experiments on multilayered perceptron network (MLP) with backpropagation learning algorithm (BKP) and dynamic time warping (DTW) techniques on Thai text-dependent speaker identification system. Three major identification engines are conducted on 50 speakers with isolated digits 0-9. Training and testing utterances were recorded over a five week duration. Furthermore, three well-known speech features, namely linear predictive coding derived cepstrum (LPCC), postfiltered ceptrum (PFL), and Mel frequency cepstral coefficient (MFCC) were evaluated. From our previous experiments, the MFCC has given the highest identification rates on DTW and MLP. Therefore, GMM with MFCC feature was experimented and attained 87.54% average identification accuracy, as opposed to 86.74% of DTW and 82.34% of MLP. The results are the same with top-3 concatenated digits, the average identification rates are 99%, 98.70 %, and 97.30% for GMM, DTW, and MLP, respectively.
Cite as: Tanprasert, C., Achariyakulporn, V. (2000) Comparative study of GMM, DTW, and ANN on Thai speaker identification system. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 2, 234-237, doi: 10.21437/ICSLP.2000-252
@inproceedings{tanprasert00_icslp, author={Chularat Tanprasert and Varin Achariyakulporn}, title={{Comparative study of GMM, DTW, and ANN on Thai speaker identification system}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 2, 234-237}, doi={10.21437/ICSLP.2000-252} }