8th Annual Conference of the International Speech Communication Association

Antwerp, Belgium
August 27-31, 2007

Feasibility of Constructing an Expressive Speech Corpus from Television Soap Opera Dialogue

Peter Rutten

VRT Medialab, Belgium

This paper presents a study into the feasibility of extracting a corpus of expressive speech from television soap opera dialogue. We investigated how dialogue can be extracted from television production tapes, and what kind of signal quality may be expected. We analysed to what extent the scripts that are used in television production can provide a transcription of the actual dialogue. From the scripts we also estimated how much dialogue speech we can expect to find for each character. We based our analysis on 7 seasons (1145 episodes) of a soap opera produced by the Flemish broadcaster VRT. The results show that processing 100 episodes can result in 3 hours of speech for one of the main characters, or 2.5 hours of dialogue between two of the main characters. The scripts, however, do not provide a quick win for automatic annotation of the corpus - they do not provide sufficiently accurate transcriptions of the dialogue that was actually spoken by the actors.

Full Paper

Acoustic Examples

VRT_Thuis_DialogueAndSoundEffectsEx.wav   An example of a track where dialogue is mixed with sound effects (steps, moving furniture).
VRT_Thuis_DialogueEx1.wav   A first example of very expressive speech.
VRT_Thuis_DialogueEx2.wav   A second example of very expressive speech.
VRT_Thuis_DialogueEx3.wav   An example of a less expressive conversation, with a low signal-to-noise ratio.

Bibliographic reference.  Rutten, Peter (2007): "Feasibility of constructing an expressive speech corpus from television soap opera dialogue", In INTERSPEECH-2007, 1306-1309.