15th Annual Conference of the International Speech Communication Association

September 14-18, 2014

The DIRHA-GRID Corpus: Baseline and Tools for Multi-Room Distant Speech Recognition Using Distributed Microphones

Marco Matassoni (1), Ramón Fernandez Astudillo (2), Athanasios Katsamanis (3), Mirco Ravanelli (1)

(1) FBK, Italy
(2) INESC-ID Lisboa, Portugal
(3) NTUA, Greece

Distant speech recognition in real-world environments is still a challenging problem and a particularly interesting topic is the investigation of multi-channel processing in case of distributed microphones in home environments. This paper presents an initiative oriented to address the challenges of such a scenario; an experimental recognition framework comprising a multi-room, multi-channel corpus and the accompanying evaluation tools is made publicly available. The overall goal is to represent a common platform for comparing state-of-the-art algorithms, share ideas of different research communities and integrate several components in a realistic distant-talking recognition chain, e.g., voice activity detection, speech/feature enhancement, channel selection and fusion, model compensation. The recordings include spoken commands (derived from the well-known GRID corpus) mixed with other acoustic events occurring in different rooms of a real apartment. The work provides a detailed description of data, tasks and baseline results, discussing the potential and limits of the approach and highlighting the impact of single modules on recognition performance.

Full Paper

Bibliographic reference.  Matassoni, Marco / Astudillo, Ramón Fernandez / Katsamanis, Athanasios / Ravanelli, Mirco (2014): "The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones", In INTERSPEECH-2014, 1613-1617.