A conversion from orthographic (written) form to a phonetic transcription is the first stage in a text-to-speech system. In this study, algorithms are presented to facilitate the text-to-phoneme (TTP) conversion for the Farsi language. Using a lexicon of about 15000 base morphemes, word formation rules are investigated and implemented. Moreover, a word segmentation of the written sentence has to be done prior to any phonetic transcription of the text. Due to special form of Farsi orthography, the word segmentation process is a complicated one. To solve the problem, a fast and on-line algorithm and a more complicated off-line algorithm are presented. The overall performance of the TTP conversion is evaluated to be more than 90%.
Cite as: Sadigh, M.R., Sheikhzadeh, H., Jahangir, M.R., Farzan, A. (2000) A rule-based approach to farsi language text-to-phoneme conversion. Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000), vol. 1, 532-535, doi: 10.21437/ICSLP.2000-132
@inproceedings{sadigh00_icslp, author={Mohammad Reza Sadigh and Hamid Sheikhzadeh and M. R. Jahangir and Arash Farzan}, title={{A rule-based approach to farsi language text-to-phoneme conversion}}, year=2000, booktitle={Proc. 6th International Conference on Spoken Language Processing (ICSLP 2000)}, pages={vol. 1, 532-535}, doi={10.21437/ICSLP.2000-132} }