This paper presents a study into the feasibility of extracting a corpus of expressive speech from television soap opera dialogue. We investigated how dialogue can be extracted from television production tapes, and what kind of signal quality may be expected. We analysed to what extent the scripts that are used in television production can provide a transcription of the actual dialogue. From the scripts we also estimated how much dialogue speech we can expect to find for each character. We based our analysis on 7 seasons (1145 episodes) of a soap opera produced by the Flemish broadcaster VRT. The results show that processing 100 episodes can result in 3 hours of speech for one of the main characters, or 2.5 hours of dialogue between two of the main characters. The scripts, however, do not provide a quick win for automatic annotation of the corpus - they do not provide sufficiently accurate transcriptions of the dialogue that was actually spoken by the actors.
|VRT_Thuis_DialogueAndSoundEffectsEx.wav||An example of a track where dialogue is mixed with sound effects (steps, moving furniture).|
|VRT_Thuis_DialogueEx1.wav||A first example of very expressive speech.|
|VRT_Thuis_DialogueEx2.wav||A second example of very expressive speech.|
|VRT_Thuis_DialogueEx3.wav||An example of a less expressive conversation, with a low signal-to-noise ratio.|
Bibliographic reference. Rutten, Peter (2007): "Feasibility of constructing an expressive speech corpus from television soap opera dialogue", In INTERSPEECH-2007, 1306-1309.