Auditory-Visual Speech Processing 2007 (AVSP2007)
Kasteel Groenendaal, Hilvarenbeek, The Netherlands
Audiovisual speech processing has reached a stage of maturity where there are now numerous computational procedures needed to measure and assess multimodal signals. However, as is often the case, the results of these procedures are better known than the procedures themselves. This paper presents a MATLAB toolbox consisting of an extensive collection of tools we have developed over the past 10 years. These tools are not intended to be the final answer for multimodal speech analysis; rather they are presented as an easy-to-use and welldocumented library whose scope is sufficiently broad to be useful to both experts and novices.
The toolbox includes procedures for measuring, organizing, modeling, and validating multiple streams of time-varying data, including acoustics, two- and threedimensional motions of the speaker. In addition to physical and derived (from video) marker data, new functions have been implemented that incorporate optical flow techniques based on the OpenCV library. When complete the toolbox will allow us to track human body gestures during speech from video noninvasively and to quantify the correspondences between different performance modalities within and across speakers.
Bibliographic reference. Barbosa, Adriano V. / Yehia, Hani C. / Vatikiotis-Bateson, Eric (2007): "MATLAB toolbox for audiovisual speech processing", In AVSP-2007, paper L5-1.