September 22-25, 1997
Working with large corpora of text highlights the need for the special treatment of Out-Of-Vocabulary (OOV) words. This paper describes a strategy for processing OOV words within a Text-To-Speech (TTS) framework of the French language. A probabilistic module, called "Devin", guesses a Part-Of-Speech (POS) for each OOV word according to the morphological structure of the word and the context in which it occurs. These POS can be either syntactic or semantic. The semantic labels represent the categories of each proper-name (family name, town name, etc.) and its linguistic origin which has a strong influence on its pronunciation. According to these POS, the system chooses the correct set of rules which will be employed by the rule- based grapheme-to-phoneme transcriber of the TTS system.
Bibliographic reference. Béchet, Frédéric / El-Bčze, Marc (1997): "Automatic assignment of part-of-speech to out-of-vocabulary words for text-to-speech processing", In EUROSPEECH-1997, 983-986.