Where Detection Engineering Meets Data and ML: A Blueprint for Modern Threat Detection

Manju LalwaniManju Lalwani
3 min read

In an era where cyber threats are more evasive, automated, and persistent than ever, relying solely on traditional detection methods just doesn't cut it. As defenders, we’re expected to outsmart threat actors using an ever-growing sea of logs, alerts, and behavioral data. But what if we could make that sea a little easier to navigate?

In this post, I want to explore how Detection Engineering, Data Engineering, and Machine Learning can come together to build smarter, more adaptive threat detection pipelines. This is the space I work in — and honestly, it’s where I think the future of blue teaming is headed.

🔍 Detection Engineering: The Foundation

Detection engineering is where most threat detection begins. Writing and tuning rules, signatures, queries — it’s our bread and butter.

Whether you're working with Sigma, Splunk SPL, or Elastic KQL, detection engineering is about translating known adversary behavior (TTPs) into actionable alerts. The challenge? It’s reactive by design. You’re always playing catch-up.

The real problem: even the best-written rule is useless if it runs on bad or incomplete data.

🛠️ Data Engineering: The Enabler

This is where data engineering comes into play. If detection logic is the engine, data engineering is the fuel system. Clean, enriched, timely data is essential to power accurate detections.

Here’s what that involves:

  • Ingesting logs from various sources: Windows Event Logs, Zeek, cloud APIs, endpoint telemetry

  • Transforming: parsing, normalizing, filtering

  • Enriching: adding context like asset ownership, geo-location, user identity

  • Storing: feeding this into a data lake, SIEM, or a custom pipeline

Tools I’ve seen or used: Kafka, Logstash, Fluentd, Apache Beam, Pandas, Snowflake, and good old Python scripts.

Data engineering is the bridge that makes raw logs useful — not just searchable, but actionable.

🤖 Machine Learning: The Accelerator

This is where things get spicy.

Machine learning in threat detection isn’t about replacing rules — it’s about augmenting them. ML helps where rule-based logic falls short: subtle anomalies, context-aware outliers, user/entity behavior analytics (UEBA), and automating triage.

Some real-world use cases:

  • Detecting insider threats based on unusual access patterns

  • Anomaly detection in process trees or authentication spikes

  • Clustering alerts to reduce noise

  • Predicting false positives with classification models

Tools I mess with: scikit-learn, PyOD, XGBoost, pandas, and sometimes Jupyter notebooks to test quick ideas. And yes, a lot of time is spent just cleaning features and validating results.

🧬 Putting It All Together

Imagine this pipeline:

  1. Logs come in from endpoints and cloud sources

  2. Data is parsed & enriched via a pipeline (e.g., Python + dbt)

  3. Features are extracted for ML models

  4. Models run inference on events in real-time or batch

  5. Output is sent to a SIEM/alerting system — alongside rule-based detections

The result? More context-rich alerts, fewer false positives, and a better shot at catching sophisticated attackers early.

⚠️ Challenges (and Why This Isn’t Magic)

  • Bad data = bad detections (ML or not)

  • Feature engineering takes time — and domain knowledge

  • ML models drift — threat behavior and environments change

  • Human-in-the-loop is still necessary for validation and tuning

🧠 Final Thoughts

If you’re a blue teamer and you're only focused on writing detections, you're only seeing part of the picture. The future of detection is hybrid — blending threat intel, solid data pipelines, and adaptive models.

This is just the beginning. In upcoming posts, I’ll break down:

  • Real detection pipelines I’ve built/tested

  • ML models that actually worked (and ones that failed)

  • Lessons learned scaling detections in noisy environments

💬 What do you think?

If you're working in a similar space — I’d love to hear from you. How are you mixing data, ML, and detections in your environment?

0
Subscribe to my newsletter

Read articles from Manju Lalwani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Manju Lalwani
Manju Lalwani

I’m a Security Engineer with 12+ years of experience spanning across the security spectrum — from Application Security and DLP to Email Security, Threat Modeling, and Detection Engineering. I started out in AppSec and secure design reviews, grew through DLP and email controls, and evolved into a threat-focused engineer obsessed with solving real-world problems using data, cloud, and smart detection. 🔍 These days, I specialize in building scalable, threat-informed detection pipelines for cloud-native, container-heavy environments — where Detection Engineering meets Data Engineering. Whether it’s turning packets into signal or crafting long-term log ingestion strategies, I love working at the intersection of: 💡 Threat detection 🛰️ Cloud & Kubernetes 🧠 Data pipelines With hands-on experience across GCP, AWS, Kubernetes, Kafka, and behavioral analytics, I bring deep technical understanding paired with a strong sense of mission. I'm a huge advocate for women in cybersecurity, and I genuinely love what I do. This isn’t just a career — it’s a passion. Whether it’s breaking down a complex detection problem, helping someone break into the field, or pushing for better representation, I bring everything I’ve got. I believe in doing meaningful work, mentoring with intention, and showing up as my full self in the field.