ISCA Archive SPECOM 2004
ISCA Archive SPECOM 2004

Automatic vowel recognition in fluent speech (on the material of the Russian language)

Daniil A. Kocharov

A vowel recognition based on a pitch-synchronous signal processing is introduced in this paper. The investigation has been made within the development of a speaker independent system of automatic speech sounds identification. The length of a signal-processing window is equal to the pitch period. This makes the signal analysis more independent of a pitch value than in the case of using a fixed-window analysis. It is known that the most effective analysis could be made if a length of the analyzing window is divisible by the pitch period. Thus the smallest window, which provides the perfect effectiveness of the signal analysis, is used here. The patterns for vowels were generated with a help of the knowledge about the phonological system and phonetic rules of the Russian language. Conducted experiments have shown that phonetically-based patterns dictionary is not less effective for speaker-independent speech recognition than the one generated with a help of clustering analysis. The proposed vowels recognition method was tested on the following material: a set of isolated vowels manually extracted from phonetically representative text, read by a standard male speaker of Russian, and a set of isolated words, read by 10 male and 10 female speakers of Russian. Vowels were automatically extracted and then identified within a processing of the second part of the material. An average recognition accuracy of 85.0% was obtained. The achieved results seem to be quite successful.


Cite as: Kocharov, D.A. (2004) Automatic vowel recognition in fluent speech (on the material of the Russian language). Proc. 9th Conference on Speech and Computer (SPECOM 2004), 308-309

@inproceedings{kocharov04_specom,
  author={Daniil A. Kocharov},
  title={{Automatic vowel recognition in fluent speech (on the material of the Russian language)}},
  year=2004,
  booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)},
  pages={308--309}
}