In this paper we present experimental investigation of using various phone sets for acoustic modeling of Lithuanian speech applied to large vocabulary continuous speech recognition. Paper presents specifics of Lithuanian speech acoustics including accentuation, diphthongs, softening and assimilation of consonants. The speech recognition experiments use only acoustic model since effective language modeling for highly inflected Lithuanian language is still under research. Several Lithuanian phone sets are proposed for evaluation in speech recognition experiments. A new Lithuania broadcast news corpus LRNO is presented. Phone occurrence frequencies in 9 hours speech training data for multiple Lithuanian phone sets are given. Recognition performance for Hidden Markov Models based on multiple proposed simple and contextual phone sets is evaluated using ÍÒÊ toolkit. Experiment results are presented using figures comparing word error rates for phone sets. Conclusions indicate influence of modeling various linguistic features such as accent, softness, mixed-diphthongs, affricates, and context to recognition performance, recommend a phone set to use for Lithuanian speech recognition, and point the future research directions.
Cite as: Silingas, D., Laurinciukaite, S., Telksnys, L. (2004) Towards acoustic modeling of Lithuanian speech. Proc. 9th Conference on Speech and Computer (SPECOM 2004), 326-332
@inproceedings{silingas04_specom, author={Darius Silingas and Sigita Laurinciukaite and Laimutis Telksnys}, title={{Towards acoustic modeling of Lithuanian speech}}, year=2004, booktitle={Proc. 9th Conference on Speech and Computer (SPECOM 2004)}, pages={326--332} }