ISCA Archive SPSC 2021
ISCA Archive SPSC 2021

ROXSD: a Simulated Dataset of Communication in Organized Crime

Kvetoslav Maly, Gerhard Backfried, Francesco Calderoni, Jan "Honza" Černocký, Erinc Dikici, Maël Fabien, Jan Hořínek, Joshua Hughes, Miroslav Janošík, Marek Kovac, Petr Motlicek, Hoang H. Nguyen, Shantipriya Parida, Johan Rohdin, Miroslav Skácel, Sergej Zerr, Dietrich Klakow, Dawei Zhu, Aravind Krishnan

Criminal investigations contain sensitive and confidential material and are nonpublic by nature. Access to investigation data is very limited and restricted to only selected groups of individuals. Even for research purposes, data typically cannot be accessed freely. Within criminal investigations, data is still processed manually to a large extent. Solutions provided for automation of this processing — or even of individual processing steps — can be assumed to have a significant impact on the work of Law Enforcement Agencies (LEAs). Automation may effectively be key to handle large and complex amounts of data in an efficient manner under the typical operating conditions of LEAs. This paper introduces the ROXANNE Simulated Dataset (ROXSD), a dataset with unique properties prepared by the ROXANNE Project1 with assistance from several LEAs, to facilitate the development and evaluation of novel tools and technologies for criminal investigations. ROXSD consists of a set of simulated intercepted telephone conversations in a variety of languages. The story follows a realistic setting and includes the conditions and constraints of a real investigation. The network topology corresponding to the conversations was created by partner LEAs to reflect various typical organized crime groups. Conversations have been transcribed carefully and annotated in the original language and in English. The dataset is expected to provide a sound basis for further research and is available to download for researchers under signed agreement.

doi: 10.21437/SPSC.2021-7

Cite as: Maly, K., Backfried, G., Calderoni, F., Černocký, J."., Dikici, E., Fabien, M., Hořínek, J., Hughes, J., Janošík, M., Kovac, M., Motlicek, P., Nguyen, H.H., Parida, S., Rohdin, J., Skácel, M., Zerr, S., Klakow, D., Zhu, D., Krishnan, A. (2021) ROXSD: a Simulated Dataset of Communication in Organized Crime. Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication, 32-36, doi: 10.21437/SPSC.2021-7

  author={Kvetoslav Maly and Gerhard Backfried and Francesco Calderoni and Jan "Honza" Černocký and Erinc Dikici and Maël Fabien and Jan Hořínek and Joshua Hughes and Miroslav Janošík and Marek Kovac and Petr Motlicek and Hoang H. Nguyen and Shantipriya Parida and Johan Rohdin and Miroslav Skácel and Sergej Zerr and Dietrich Klakow and Dawei Zhu and Aravind Krishnan},
  title={{ROXSD: a Simulated Dataset of Communication in Organized Crime}},
  booktitle={Proc. 2021 ISCA Symposium on Security and Privacy in Speech Communication},