Where Detection Engineering Meets Data and ML: A Blueprint for Modern Threat Detection

In an era where cyber threats are more evasive, automated, and persistent than ever, relying solely on traditional detection methods just doesn't cut it. As defenders, we’re expected to outsmart threat actors using an ever-growing sea of logs, alerts, and behavioral data. But what if we could make that sea a little easier to navigate?
In this post, I want to explore how Detection Engineering, Data Engineering, and Machine Learning can come together to build smarter, more adaptive threat detection pipelines. This is the space I work in — and honestly, it’s where I think the future of blue teaming is headed.
🔍 Detection Engineering: The Foundation
Detection engineering is where most threat detection begins. Writing and tuning rules, signatures, queries — it’s our bread and butter.
Whether you're working with Sigma, Splunk SPL, or Elastic KQL, detection engineering is about translating known adversary behavior (TTPs) into actionable alerts. The challenge? It’s reactive by design. You’re always playing catch-up.
The real problem: even the best-written rule is useless if it runs on bad or incomplete data.
🛠️ Data Engineering: The Enabler
This is where data engineering comes into play. If detection logic is the engine, data engineering is the fuel system. Clean, enriched, timely data is essential to power accurate detections.
Here’s what that involves:
Ingesting logs from various sources: Windows Event Logs, Zeek, cloud APIs, endpoint telemetry
Transforming: parsing, normalizing, filtering
Enriching: adding context like asset ownership, geo-location, user identity
Storing: feeding this into a data lake, SIEM, or a custom pipeline
Tools I’ve seen or used: Kafka, Logstash, Fluentd, Apache Beam, Pandas, Snowflake, and good old Python scripts.
Data engineering is the bridge that makes raw logs useful — not just searchable, but actionable.
🤖 Machine Learning: The Accelerator
This is where things get spicy.
Machine learning in threat detection isn’t about replacing rules — it’s about augmenting them. ML helps where rule-based logic falls short: subtle anomalies, context-aware outliers, user/entity behavior analytics (UEBA), and automating triage.
Some real-world use cases:
Detecting insider threats based on unusual access patterns
Anomaly detection in process trees or authentication spikes
Clustering alerts to reduce noise
Predicting false positives with classification models
Tools I mess with: scikit-learn
, PyOD
, XGBoost
, pandas
, and sometimes Jupyter
notebooks to test quick ideas. And yes, a lot of time is spent just cleaning features and validating results.
🧬 Putting It All Together
Imagine this pipeline:
Logs come in from endpoints and cloud sources
Data is parsed & enriched via a pipeline (e.g., Python + dbt)
Features are extracted for ML models
Models run inference on events in real-time or batch
Output is sent to a SIEM/alerting system — alongside rule-based detections
The result? More context-rich alerts, fewer false positives, and a better shot at catching sophisticated attackers early.
⚠️ Challenges (and Why This Isn’t Magic)
Bad data = bad detections (ML or not)
Feature engineering takes time — and domain knowledge
ML models drift — threat behavior and environments change
Human-in-the-loop is still necessary for validation and tuning
🧠 Final Thoughts
If you’re a blue teamer and you're only focused on writing detections, you're only seeing part of the picture. The future of detection is hybrid — blending threat intel, solid data pipelines, and adaptive models.
This is just the beginning. In upcoming posts, I’ll break down:
Real detection pipelines I’ve built/tested
ML models that actually worked (and ones that failed)
Lessons learned scaling detections in noisy environments
💬 What do you think?
If you're working in a similar space — I’d love to hear from you. How are you mixing data, ML, and detections in your environment?
Subscribe to my newsletter
Read articles from Manju Lalwani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Manju Lalwani
Manju Lalwani
I’m a Security Engineer with 12+ years of experience spanning across the security spectrum — from Application Security and DLP to Email Security, Threat Modeling, and Detection Engineering. I started out in AppSec and secure design reviews, grew through DLP and email controls, and evolved into a threat-focused engineer obsessed with solving real-world problems using data, cloud, and smart detection. 🔍 These days, I specialize in building scalable, threat-informed detection pipelines for cloud-native, container-heavy environments — where Detection Engineering meets Data Engineering. Whether it’s turning packets into signal or crafting long-term log ingestion strategies, I love working at the intersection of: 💡 Threat detection 🛰️ Cloud & Kubernetes 🧠 Data pipelines With hands-on experience across GCP, AWS, Kubernetes, Kafka, and behavioral analytics, I bring deep technical understanding paired with a strong sense of mission. I'm a huge advocate for women in cybersecurity, and I genuinely love what I do. This isn’t just a career — it’s a passion. Whether it’s breaking down a complex detection problem, helping someone break into the field, or pushing for better representation, I bring everything I’ve got. I believe in doing meaningful work, mentoring with intention, and showing up as my full self in the field.