5th International Conference on Spoken Language Processing
Knowledge of phonetic unit frequency is very necessary for developing databases in both concatenative synthesis and continuous speech recognition. In the present work, a large corpus of text was processed and phonetically transcribed to obtain allophone and diphone frequencies for the Catalan language. The corpus was acquired from newspaper articles, in which there were a lot of foreign words that represented a problem in the normalisation process. After automatic transcription, units were counted to get their relative frequency and results were compared to other analysis. Finally, diphones found in the corpus were compared to units of a synthesis database to validate both the normalisation and transcription modules and the synthesis unit database.
Bibliographic reference. Esquerra, Ignasi / Febrer, Albert / Nadeu, Climent (1998): "Frequency analysis of phonetic units for concatenative synthesis in catalan", In ICSLP-1998, paper 0817.