A system capable of interpreting affect from a speaking face must recognise and fuse signals from multiple cues. Building such a system requires the integration of software components to perform tasks such as image registration, video segmentation, speech recognition and classification. Such software components tend to be idiosyncratic, purpose-built, and driven by scripts and textual configuration files. Integrating components to achieve the necessary degree of flexibility to perform full multimodal affective recognition is challenging. We discuss the key requirements and describe a system to perform multimodal affect sensing which integrates such software components and meets these requirements.
Bibliographic reference. McIntyre, Gordon / Goecke, Roland (2008): "A composite framework for affective sensing", In INTERSPEECH-2008, 2767-2770.