In this paper we describe a novel use of a multi-layer Kohonen self-organizing feature map (MLKSFM) for spoken language identification (LID). A normalized, segment-based input feature vector is used in order to maintain the temporal information of speech signal. The LID is performed by using different system configurations of the MLKSFM. Compared with a baseline PPRLM system, our novel system is capable of achieving a similar identification rate, but requires less training time and no phone labeling of training data. The MLKSFM with the sheet-shaped map and the hexagonal-lattice neighborhoods relationship is found to give the best performance for the LID task, and this system is able to achieve a LID rate of 76.4% and 62.4% for the 45-sec and 10-sec OGI speech utterances, respectively.
Cite as: Wang, L., Ambikairajah, E., Choi, E.H.C. (2007) Multi-layer kohonen self-organizing feature map for language identification. Proc. Interspeech 2007, 174-177, doi: 10.21437/Interspeech.2007-73
@inproceedings{wang07b_interspeech, author={Liang Wang and Eliathamby Ambikairajah and Eric H. C. Choi}, title={{Multi-layer kohonen self-organizing feature map for language identification}}, year=2007, booktitle={Proc. Interspeech 2007}, pages={174--177}, doi={10.21437/Interspeech.2007-73} }