Publications

Cybersecurity datasets: A mirage

Abstract

Cybersecurity research has a great, unmet need for datasets that meet several specific requirements:(1) datasets must contain real security and “peace-time” events,(2) researchers should be able to adequately access both recent and curated datasets,(3) some reasonable number of datasets must be accurately labeled with regard to security events,(4) some datasets should have varying levels of event sophistication, and (5) it must be possible to cross-correlate datasets with other datasets from public or private domain. This paper examines these requirements, discusses on why they are difficult to meet and why they are crucial for advancements in cybersecurity research, and discusses some forward directions.

Date
September 12, 2025
Authors
Jelena Mirkovic, Stephen Hayne, Michalis Kallitsis, Wes Hardaker, John Heidemann, Christos Papadopoulos, Devkishen Sisodia
Conference
NSF Workshop on Overcoming Measurement Barriers to Internet Research (WOMBIR 2021)