Distant speech recognition in real-world environments is still a challenging problem and a particularly interesting topic is the investigation of multi-channel processing in case of distributed microphones in home environments. This paper presents an initiative oriented to address the challenges of such a scenario; an experimental recognition framework comprising a multi-room, multi-channel corpus and the accompanying evaluation tools is made publicly available. The overall goal is to represent a common platform for comparing state-of-the-art algorithms, share ideas of different research communities and integrate several components in a realistic distant-talking recognition chain, e.g., voice activity detection, speech/feature enhancement, channel selection and fusion, model compensation. The recordings include spoken commands (derived from the well-known GRID corpus) mixed with other acoustic events occurring in different rooms of a real apartment. The work provides a detailed description of data, tasks and baseline results, discussing the potential and limits of the approach and highlighting the impact of single modules on recognition performance.
Bibliographic reference. Matassoni, Marco / Astudillo, Ramón Fernandez / Katsamanis, Athanasios / Ravanelli, Mirco (2014): "The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones", In INTERSPEECH-2014, 1613-1617.