INTERSPEECH 2006 - ICSLP
The vector-based spoken language recognition approach converts a spoken utterance into a high dimensional vector, also known as a bagof- sounds vector, that consists of n-gram statistics of acoustic units. Dimensionality reduction would better prepare the bag-of-sounds vectors for classifier design. We propose projecting the bag-of-sounds vectors onto a low dimensional SVM output coding space, where each dimension represents a decision hyperplane between a pair of spoken languages. We also compare the performances of the output coding approach and the traditional low ranking approximation approach using latent semantic indexing (LSI) on the NIST 1996, 2003 and 2005 Language Recognition Evaluation (LRE) databases. The experiments show that the output coding approach consistently outperforms LSI with competitive results.
Bibliographic reference. Li, Haizhou / Ma, Bin / Tong, Rong (2006): "Vector-based spoken language recognition using output coding", In INTERSPEECH-2006, paper 1155-Mon2CaP.8.