A top-down sharing scheme based on decision trees is used to capture enormous variability of Russian vowel data. The bootstrap part of the Russian database for telephone applications (TeCoRus) is used as the speech material for the experiments. It comprises approximately 6 hours of manually segmented readings by 6 speakers (3 male and 3 female) of a phonetically representative text of 510 sentences. Russian vowels were initially represented by a hierarchical structure of 53 classes based on phonetic quality of the vowel, its stress characteristics and nasalized realization. The designed set of questions comprised 57 questions addressed to the middle element of the triphone, while 98 questions (58 among them checking identity of a particular phone) were applied symmetrically to the left right contexts. Results of some pilot experiments concerned with establishing optimal set of broad phonetic classes, questions to the tree nodes and resultant inventory of context-sensitive phones are presented.
Cite as: Kouznetsov, V., Chuchupal, V. (2004) Increasing trainability of ASR system by means of top-down clustering procedure based on decision trees (vowel data for Russian). Proc. 9th Conference on Speech and Computer (SPECOM 2004), 289-290
@inproceedings{kouznetsov04_specom, author={V. Kouznetsov and V. Chuchupal}, title={{Increasing trainability of ASR system by means of top-down clustering procedure based on decision trees (vowel data for Russian)}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={289--290} }