1st Joint SIG-IL/Microsoft Workshop on Speech and Language Technologies for Iberian Languages

Porto Salvo, Portugal
September 3-4, 2009

CORPOR System: Corpora of the Portuguese Language as Spoken in São Paulo

Zilda Maria Zapparoli

Universidade de São Paulo, CNPq, FAPESP, Brasil

This work briefly discusses the construction of the Orthographic and Phonetic Information Databases of the Portuguese Language Spoken in the State of São Paulo (São Paulo City, Campinas, Itu) in a Relational Database System. Informatics resources were used to store, process and analyze authentic oral language, and the Bases include orthographic and phonetic information about the Portuguese language as spoken in those areas of the state of São Paulo, organized, listed and stored taking into account linguistic and extralinguistic annotations. The results obtained can serve as a valuable aid, for example, in studies requiring automatic processing of the Portuguese language.

Index Terms: Linguistic Informatics, data processing technologies in Linguistic studies, CorPor project, relational database system, databanks of phonetic and orthographic information about the Portuguese language as spoken in São Paulo, electronic corpora of the Portuguese language as spoken in São Paulo

Full Paper

Bibliographic reference.  Zapparoli, Zilda Maria (2009): "CORPOR system: corpora of the Portuguese language as spoken in So Paulo", In SLTECH-2009, 35-38.