8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

MuLAS: A Framework for Automatically Building Multi-Tier Corpora

Sérgio Paulo, Luís C. Oliveira

L2F INESC-ID/IST, Portugal

The Multi- Level Alignment System (MuLAS) is the L2F tool for building multi-tier speech corpora with reduced or no human intervention at all. MuLAS automatically combines information coming from external speech annotations, human or machine-generated, with the text-based utterance descriptions that it creates, in order to build more reliable and complete descriptions of the spoken utterances.

This paper presents our methods for multi-tier annotation synchronization, which lie behind the MuLAS operation. Such methods have allowed us to expand the building of multi-tier corpora to new languages without spending too much effort. MuLAS has been successfully applied to the building of multi-tier corpora for speech synthesis in American and British English, European Portuguese and German. Natural prosody generation has benefited from MuLAS, too, since prosodic models can be derived from corpora built by MuLAS.

Full Paper

Bibliographic reference.  Paulo, Sérgio / Oliveira, Luís C. (2007): "MuLAS: a framework for automatically building multi-tier corpora", In INTERSPEECH-2007, 1525-1528.