We present a prosodic corpus in five languages (French, English, Italian, German and Spanish) comprising 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based are extracted from the EUROM 1 speech database and consists of passages of about five sentences. The corpus was stylized automatically by an algorithm which factors out microprosodic effects and represents the intonation contour of utterances by a series of target points. Once interpolated by a smooth curve (spline), these points produce a contour undistinguishable from the original when re-synthesized, apart from a few detection errors. A symbolic coding of the 50000 pitch movements of the corpus is also provided, along with the time-alignment of orthographic transcription to signal at word-level. The entire corpus was verified and manually corrected by experts for each language. It will be made available at production cost for research through the European Language Resource Association (ELRA).
Cite as: Campione, E., Véronis, J. (1998) A multilingual prosodic database. Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998), paper 0844, doi: 10.21437/ICSLP.1998-609
@inproceedings{campione98b_icslp, author={Estelle Campione and Jean Véronis}, title={{A multilingual prosodic database}}, year=1998, booktitle={Proc. 5th International Conference on Spoken Language Processing (ICSLP 1998)}, pages={paper 0844}, doi={10.21437/ICSLP.1998-609} }