5th European Conference on Speech Communication and Technology

Rhodes, Greece
September 22-25, 1997

CORPORA - Speech Database for Polish Diphones

Stefan Grocholewski

Institute of Computing Science, Poznan University of Technology Poznan, Poland

In the paper the attempts for creating the first databases for Polish are presented. Among two databases, supported by The Polish National Research Committee, and COPERNICUS project (1304 "BABEL: a Multi-Language Database" for Polish, Bulgarian, Estonian, Hungarian, Romanian) the first of them is presented in detail. The speech material contains 365 utterances (alphabet letters, digits, 200 first names, 114 sentences) uttered by 45 speakers. In the paper the design ideas, recording conditions, annotation rules, the method of automatic segmentation and labelling used in CORPORA are presented.

Full Paper

Bibliographic reference.  Grocholewski, Stefan (1997): "CORPORA - speech database for polish diphones", In EUROSPEECH-1997, 1735-1738.