The Audio-Visual Face Cover Corpus consists of high-quality audio and video recordings of 10 native British English speakers wearing different types of 'facewear'. Speakers read aloud a set of 64 /C1VC2/ syllables embedded in a carrier phrase. 18 English consonants occurred twice each in onset and coda positions. Speakers recited the list 1+8 times, i.e. once in control condition (no facewear) and eight times while wearing a forensically-relevant face covering. Audio recordings were made by simultaneously capturing the speech via a headband microphone and two shotgun microphones placed facing and behind the speaker. Footage of the subject's head and shoulders was filmed from two camera angles, frontal and half-profile. In total, 6,120 utterances were recorded per device. This paper aims to specify the database design, to introduce forensic-phonetic research utilising the data, and to demonstrate the corpus's potential applications in related fields of study and in casework conducted by forensic speech scientists.
Index Terms: speech database, audio-visual, forensic speech science, facewear, disguise, acoustic phonetics, perception
Bibliographic reference. Fecher, Natalie (2012): "The "audio-visual face cover corpus": investigations into audio-visual speech and speaker recognition when the speaker's face is occluded by facewear", In INTERSPEECH-2012, 2250-2253.