Anomaly detection techniques - Scoring and performance

Scoring Mechanisms in Anomaly Detection
Z-Score Method
Calculation: For a given feature (e.g.,
wrong_fragment
), the z-score is computed as:z=(x−μ)σ z
where x is the data point, μ is the mean, and σ(sigma) is the standard deviation.
Thresholding: Data points with∣z∣>2 are considered anomalies.
Labeling: Assign
1
to anomalies and0
to normal points.Evaluation: Using the confusion matrix, the model's performance is assessed. For instance, a high number of false negatives indicates that many anomalies were not detected
Elliptic Envelope
Concept: Assumes data follows a Gaussian distribution and fits an ellipse to encompass the majority of data points.
Labeling: Predictions are
-1
for anomalies and1
for normal points. These are mapped to1
and0
, respectively, for consistency.Evaluation: The confusion matrix reveals the model's precision and recall. A significant number of false negatives suggests that the model misses many anomalies.
Local Outlier Factor (LOF)
Concept: Measures the local density deviation of a given data point concerning its neighbors.
Parameter Tuning: The choice of k
(number of neighbors) significantly affects performance. A small k
may lead to over fitting, while a large k
might overlook local anomalies.
- Evaluation: By varying
k
, metrics like accuracy, precision, and recall are plotted to identify the optimal value.
One-class SVM
Concept: Learns a decision function for novelty detection, classifying new data as similar or different from the training set.
Labeling: Predictions are -1
for anomalies and 1
for normal points, which are then mapped accordingly
Evaluation: The confusion matrix indicates the model's ability to detect anomalies, balancing between false positives and false negatives.
Isolation Forest
Concept: An ensemble method that isolates anomalies instead of profiling normal data points.
Scoring: An anomaly score s(x,n)s(x, n)s(x,n) is computed as:
s(x,n)=2−E(h(x))c(n) s(x, n) = 2 - \frac{E(h(x))}{c(n)}s(x,n)=2−c(n)E(h(x))
where E(h(x))E(h(x))E(h(x)) is the average path length to isolate point xxx, and c(n)c(n)c(n) is the average path length of unsuccessful searches in a Binary Search Tree.
Labeling: Predictions are -1
for anomalies and 1
for normal points, mapped accordingly.
Evaluation: The confusion matrix helps assess the model's precision and recall, indicating its effectiveness in isolating anomalies.
Performance Metrics
For each method, the following metrics are computed:
Accuracy: Proportion of correct predictions (both anomalies and normal points).
Precision: Proportion of correctly identified anomalies out of all points labeled as anomalies.
Recall: Proportion of actual anomalies that were correctly identified.
These metrics are derived from the confusion matrix, which tabulates true positives, false positives, true negatives, and false negatives.
Subscribe to my newsletter
Read articles from Manju Lalwani directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Manju Lalwani
Manju Lalwani
I’m a Security Engineer with 12+ years of experience spanning across the security spectrum — from Application Security and DLP to Email Security, Threat Modeling, and Detection Engineering. I started out in AppSec and secure design reviews, grew through DLP and email controls, and evolved into a threat-focused engineer obsessed with solving real-world problems using data, cloud, and smart detection. 🔍 These days, I specialize in building scalable, threat-informed detection pipelines for cloud-native, container-heavy environments — where Detection Engineering meets Data Engineering. Whether it’s turning packets into signal or crafting long-term log ingestion strategies, I love working at the intersection of: 💡 Threat detection 🛰️ Cloud & Kubernetes 🧠 Data pipelines With hands-on experience across GCP, AWS, Kubernetes, Kafka, and behavioral analytics, I bring deep technical understanding paired with a strong sense of mission. I'm a huge advocate for women in cybersecurity, and I genuinely love what I do. This isn’t just a career — it’s a passion. Whether it’s breaking down a complex detection problem, helping someone break into the field, or pushing for better representation, I bring everything I’ve got. I believe in doing meaningful work, mentoring with intention, and showing up as my full self in the field.