This paper describes the main components of "Eloquens", the CSELT diphone-based text-to-speech synthesis system for the Italian language. The general architecture includes some major modules: text analysis and lexicon access, linguistic/phonetic processing and prosodic control in a multi-level rule writing environment, diphone repertory, synthesizer model The latest progress especially in text processing, acoustic unit set and concatenation technique, and prosodic rule development, exhibits a marked improvement of the synthetic speech overall quality. Evaluation tests indicate an increase of both segmental and word intelligibility in a speech synthesis application task. The software structure is organized to allow easy portability on several platforms and real-time implementation, with multi-channel capabilities and interactive functions.
Keywords: Text-to-Speech System, Diphone synthesis, Italian language
Bibliographic reference. Balestri, Marcello / Lazzaretto, Stefano / Salza, Pier Luigi / Sandri, Stefano (1993): "The CSELT system for Italian text-to-speech synthesis", In EUROSPEECH'93, 2091-2094.