Anomaly detection techniques - Scoring and performance

Manju LalwaniManju Lalwani
2 min read

Scoring Mechanisms in Anomaly Detection

Z-Score Method

  • Calculation: For a given feature (e.g., wrong_fragment), the z-score is computed as:

    z=(x−μ)σ z

    where x is the data point, μ is the mean, and σ(sigma) is the standard deviation.

  • Thresholding: Data points with∣z∣>2 are considered anomalies.

  • Labeling: Assign 1 to anomalies and 0 to normal points.

  • Evaluation: Using the confusion matrix, the model's performance is assessed. For instance, a high number of false negatives indicates that many anomalies were not detected

Elliptic Envelope

Concept: Assumes data follows a Gaussian distribution and fits an ellipse to encompass the majority of data points.

  • Labeling: Predictions are -1 for anomalies and 1 for normal points. These are mapped to 1 and 0, respectively, for consistency.

  • Evaluation: The confusion matrix reveals the model's precision and recall. A significant number of false negatives suggests that the model misses many anomalies.

Local Outlier Factor (LOF)

Concept: Measures the local density deviation of a given data point concerning its neighbors.

Parameter Tuning: The choice of k (number of neighbors) significantly affects performance. A small k may lead to over fitting, while a large k might overlook local anomalies.

  • Evaluation: By varying k, metrics like accuracy, precision, and recall are plotted to identify the optimal value.

One-class SVM

Concept: Learns a decision function for novelty detection, classifying new data as similar or different from the training set.

Labeling: Predictions are -1 for anomalies and 1 for normal points, which are then mapped accordingly

Evaluation: The confusion matrix indicates the model's ability to detect anomalies, balancing between false positives and false negatives.

Isolation Forest

Concept: An ensemble method that isolates anomalies instead of profiling normal data points.

Scoring: An anomaly score s(x,n)s(x, n)s(x,n) is computed as:

  • s(x,n)=2−E(h(x))c(n) s(x, n) = 2 - \frac{E(h(x))}{c(n)}s(x,n)=2−c(n)E(h(x))​

    where E(h(x))E(h(x))E(h(x)) is the average path length to isolate point xxx, and c(n)c(n)c(n) is the average path length of unsuccessful searches in a Binary Search Tree.

Labeling: Predictions are -1 for anomalies and 1 for normal points, mapped accordingly.

Evaluation: The confusion matrix helps assess the model's precision and recall, indicating its effectiveness in isolating anomalies.

Performance Metrics

For each method, the following metrics are computed:

Accuracy: Proportion of correct predictions (both anomalies and normal points).

Precision: Proportion of correctly identified anomalies out of all points labeled as anomalies.

Recall: Proportion of actual anomalies that were correctly identified.

These metrics are derived from the confusion matrix, which tabulates true positives, false positives, true negatives, and false negatives.

0
Subscribe to my newsletter

Read articles from Manju Lalwani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Manju Lalwani
Manju Lalwani

I’m a Security Engineer with 12+ years of experience spanning across the security spectrum — from Application Security and DLP to Email Security, Threat Modeling, and Detection Engineering. I started out in AppSec and secure design reviews, grew through DLP and email controls, and evolved into a threat-focused engineer obsessed with solving real-world problems using data, cloud, and smart detection. 🔍 These days, I specialize in building scalable, threat-informed detection pipelines for cloud-native, container-heavy environments — where Detection Engineering meets Data Engineering. Whether it’s turning packets into signal or crafting long-term log ingestion strategies, I love working at the intersection of: 💡 Threat detection 🛰️ Cloud & Kubernetes 🧠 Data pipelines With hands-on experience across GCP, AWS, Kubernetes, Kafka, and behavioral analytics, I bring deep technical understanding paired with a strong sense of mission. I'm a huge advocate for women in cybersecurity, and I genuinely love what I do. This isn’t just a career — it’s a passion. Whether it’s breaking down a complex detection problem, helping someone break into the field, or pushing for better representation, I bring everything I’ve got. I believe in doing meaningful work, mentoring with intention, and showing up as my full self in the field.