EUROSPEECH 2003 - INTERSPEECH 2003
We are investigating unsupervised phone modeling. This paper describes a derivation method of VQ codebook sequences of variable length segments from spoken word samples, and also describes evaluation results by applying the method to mixed-lingual speech recognition tasks which include non-native speakers. The VQ codebook is generated based on a piecewise linear segmentation method which includes segmentation, alignment, reduction and clustering processes. Derived codebook sequences are evaluated by speaker independent recognition of a word set which is a mixture of English and Japanese word. Speech samples are uttered by both English and Japanese native speakers. The recognition rates of mixed-lingual 618 words by using a codebook consist of 128 codes are 89.7% for English native speakers and 79.4% for Japanese native speakers in average .
Bibliographic reference. Kojima, Hiroaki / Tanaka, Kazuyo (2003): "Mixed-lingual spoken word recognition by using VQ codebook sequences of variable length segments", In EUROSPEECH-2003, 2485-2488.