8th European Conference on Speech Communication and Technology

Geneva, Switzerland
September 1-4, 2003


Mixed-Lingual Spoken Word Recognition by Using VQ Codebook Sequences of Variable Length Segments

Hiroaki Kojima (1), Kazuyo Tanaka (2)

(1) AIST, Japan
(2) University of Tsukuba, Japan

We are investigating unsupervised phone modeling. This paper describes a derivation method of VQ codebook sequences of variable length segments from spoken word samples, and also describes evaluation results by applying the method to mixed-lingual speech recognition tasks which include non-native speakers. The VQ codebook is generated based on a piecewise linear segmentation method which includes segmentation, alignment, reduction and clustering processes. Derived codebook sequences are evaluated by speaker independent recognition of a word set which is a mixture of English and Japanese word. Speech samples are uttered by both English and Japanese native speakers. The recognition rates of mixed-lingual 618 words by using a codebook consist of 128 codes are 89.7% for English native speakers and 79.4% for Japanese native speakers in average .

Full Paper

Bibliographic reference.  Kojima, Hiroaki / Tanaka, Kazuyo (2003): "Mixed-lingual spoken word recognition by using VQ codebook sequences of variable length segments", In EUROSPEECH-2003, 2485-2488.