This paper describes the acquisition of PRAV, a phonetically rich audio-visual Corpus. The PRAV Corpus contains audio as well as visual recordings of 2368 sentences from the TIMIT corpus each spoken by four subjects, making it the largest audio-visual corpus in the literature in terms of the number of sentences per subject. Visual features, comprising the coordinates of points along the contour of the subjects lips, have been extracted for the entire PRAV Corpus using the Active Appearance Models (AAM) algorithm and have been made available along with the audio and video recordings. The subjects being Indian makes PRAV an ideal resource for audio-visual speech study with non-native English speakers. Moreover, this paper describes how the large number of sentences per subject makes the PRAV Corpus a significant dataset by highlighting its utility in exploring a number of potential research problems including visual speech synthesis and perception studies.
Cite as: Narwekar, A., Ghosh, P.K. (2017) PRAV: A Phonetically Rich Audio Visual Corpus. Proc. Interspeech 2017, 3747-3751, doi: 10.21437/Interspeech.2017-242
@inproceedings{narwekar17_interspeech, author={Abhishek Narwekar and Prasanta Kumar Ghosh}, title={{PRAV: A Phonetically Rich Audio Visual Corpus}}, year=2017, booktitle={Proc. Interspeech 2017}, pages={3747--3751}, doi={10.21437/Interspeech.2017-242} }