Seminars and Events

Cybersecurity Seminar Series

Data-driven Security using PySpark

Event Details

Abstract:

Organizations with large log volumes, custom log types, many different logs, and/or special analysis needs have moved beyond the capabilities of SIEMs with packaged analysis or special purpose languages focused primarily on search / matching use cases. Also, Machine Learning expertise is becoming deeper and broader, particularly in large organizations with dedicated Data Engineering and Data Science teams. These trends are driving Security organizations more toward Data-driven analysis that require highly flexible interfaces such as Python notebooks.

Our Detection Engineering team has been using our PySpark platform to build streaming pipelines that can cover basic rule-based use cases, but also full ML models registered with MLFLOW. In this talk we’ll discuss the underlying Python framework we’ve built for our own operational needs, and how we’re working toward releasing code that other organizations can use.

Speaker Bio

Markus De Shon, PhD (Physics, Georgia Tech 1998) has worked in information security since 2000 at SecureWorks, CERT, Google and Netflix. Since 2020, as Director of Detection Engineering at Databricks, he has been leading a team to develop a comprehensive framework for data-driven detection engineering.