In this paper, we present an approach that allows a TTS-system to dictate texts
to primary school pupils, while being in conformity with the prosodic features
of this speaking style. The approach relies on the elaboration of a preprocessing
prosodic module that avoids developing a specific system for a so limited task.
The proposal is based on two distinct elements: (i) the results of a preliminary
evaluation that allowed getting feedback from potential users; (ii) a corpus
study of 10 dictations annotated or uttered by 13 teachers or speech therapists
(10 and 3 respectively).
The preliminary evaluation focused on three points: the accuracy of the segmentation procedure, the size of the automatically calculated chunks, and the intelligibility of the synthesized voice. It showed that the chunks were judged too long, and the speaking rate too fast. We thus decided to work on these two issues while analyzing the collected data, and confronting the obtained realizations with the outcome of the speech synthesis system and the chunking algorithm. The results of the analysis lead to propose a module that provides for this speaking style an enriched text that can be treated by the synthesizer to constrain the unit selection and the prosodic realization.
Bibliographic reference. Delais-Roussarie, Elisabeth / Lolive, Damien / Yoo, Hiyon / Barbot, Nelly / Rosec, Olivier (2014): "Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictation", In INTERSPEECH-2014, 1673-1677.