September 22-25, 1997
In the paper the attempts for creating the first databases for Polish are presented. Among two databases, supported by The Polish National Research Committee, and COPERNICUS project (1304 "BABEL: a Multi-Language Database" for Polish, Bulgarian, Estonian, Hungarian, Romanian) the first of them is presented in detail. The speech material contains 365 utterances (alphabet letters, digits, 200 first names, 114 sentences) uttered by 45 speakers. In the paper the design ideas, recording conditions, annotation rules, the method of automatic segmentation and labelling used in CORPORA are presented.
Bibliographic reference. Grocholewski, Stefan (1997): "CORPORA - speech database for polish diphones", In EUROSPEECH-1997, 1735-1738.