8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Homograph Ambiguity Resolution in Front-End Design for Portuguese TTS Systems

Daniela Braga (1), Luís Coelho (2), Fernando Gil V. Resende Jr. (3)

(1) MSFT, Portugal
(2) Instituto Politécnico do Porto, Portugal
(3) Federal University of Rio de Janeiro, Brazil

In this paper, a module for homograph disambiguation in Portuguese Text-to-Speech (TTS) is proposed. This module works with a part-of-speech (POS) parser, used to disambiguate homographs that belong to different parts-of-speech, and a semantic analyzer, used to disambiguate homographs which belong to the same part-of-speech. The proposed algorithms are meant to solve a significant part of homograph ambiguity in European Portuguese (EP) (106 homograph pairs so far). This system is ready to be integrated in a Letter-to-Sound (LTS) converter. The algorithms were trained and tested with different corpora. The obtained experimental results gave rise to 97.8% of accuracy rate. This methodology is also valid for Brazilian Portuguese (BP), since 95 homographs pairs are exactly the same as in EP. A comparison with a probabilistic approach was also done and results were discussed.

Full Paper

Bibliographic reference.  Braga, Daniela / Coelho, Luís / , Fernando Gil V. Resende Jr. (2007): "Homograph ambiguity resolution in front-end design for portuguese TTS systems", In INTERSPEECH-2007, 1761-1764.