Building High-Fidelity SIEM Rules: Taming Alert Noise


Ever feel like your SIEM is more of a noise generator than a threat detector? You're not alone. The key to taming the alert flood lies in moving beyond simple "100 emails in a minute" type rules. We need a SIEM Rule Constitution – a sophisticated "constitution" or rule set defining precisely when your system should flag an alert. This approach focuses on contextual, combined behavioral patterns.
Let's break down the core pillars of a robust SIEM "Rule Constitution":
- Event Correlation:
- What it is: Think of this as your SIEM playing detective. It's not just looking at one clue (a single event) but piecing multiple, seemingly unrelated activities together. For example, a user login from an unusual IP address, followed by access to sensitive files, and then an attempt to upload data to an external site.
- Fundamental Purpose: To uncover complex attack patterns or suspicious activities that individual events, in isolation, wouldn't reveal. It adds context to raw data.
Imagine a security checkpoint at a sensitive facility. One check might be verifying an ID. Another might be checking a list of authorized personnel for that specific day. A third could be a metal detector. Event correlation is like requiring an individual to pass all relevant checks in a specific sequence or combination before an alarm is raised. A single failed check might be an anomaly, but multiple combined failures point to a higher risk.
- Smart Thresholds:
- What it is: Instead of fixed, arbitrary numbers (e.g., "alert if more than 10 failed logins"), smart thresholds are dynamic and relative. They establish a baseline of normal behavior for individual users, groups, or entities and then trigger alerts based on deviations from that specific baseline.
- Fundamental Purpose: To significantly reduce false positives by understanding what's "normal" for a given entity. An activity that's anomalous for one user might be perfectly routine for another.
Think of credit card fraud detection. A $5,000 purchase might be normal for a CEO who frequently travels and entertains (their "baseline"), but highly suspicious for an intern who usually only buys coffee and lunch (their different "baseline"). Smart thresholds apply this personalized vigilance to user and system behavior.
- Temporal Windows:
- What it is: This involves analyzing events and patterns within defined timeframes. It's not just what happened, but when and for how long or how frequently it happened.
- Fundamental Purpose: To detect patterns of repeated anomalies or activities that are only suspicious when they occur in a certain sequence or density over time (e.g., brute-force attacks, slow data exfiltration).
Consider a smoke detector. A tiny puff of smoke (one event) might not trigger it. But if it detects a consistent presence of smoke (multiple "events") over a 30-second window (the temporal window), the alarm sounds. This prevents alerts from, say, briefly blowing out a candle, but catches a real fire.
To visualize how these components work together, consider this simplified SIEM alert evaluation process:
flowchart TB
subgraph Raw Event Ingestion
direction LR
A[Event Source 1 <br> e.g., Firewall Logs]
B[Event Source 2 <br> e.g., Active Directory]
C[Event Source 3 <br> e.g., Cloud App Logs]
end
subgraph SIEM Processing Engine
direction TB
D[Event Parsing & Normalization]
E{Rule Matching: <br> Event Correlation Logic}
F{Smart Threshold Evaluation: <br> Deviation from Baseline?}
G{Temporal Window Analysis: <br> Pattern over Time?}
H[Alert Generation]
end
I[SOC Analyst Review <br> & Incident Response]
A --> D
B --> D
C --> D
D --> E
E -- Condition Met --> F
E -- Condition Not Met --> D
F -- Threshold Exceeded --> G
F -- Within Normal Range --> D
G -- Pattern Confirmed --> H
G -- No Sustained Pattern --> D
H --> I
This flowchart illustrates how events are ingested, processed against correlation rules, checked against smart thresholds, and analyzed within temporal windows before an alert is finally generated for SOC review.
🧪 Real-World Examples: Reducing False Positives with a SIEM "Rule Constitution"
Let's look at some practical rule examples that leverage these principles to minimize false alerts.
Sudden High-Volume External Sharing (No Prior History)
- Scenario: A user suddenly starts sharing files externally with a significantly larger number of recipients than their established norm.
- Tool: Kusto (Azure Sentinel)
// Define the baseline period for normal behavior (e.g., past 30 days, excluding today) let BaselineWindowStart = ago(30d); let BaselineWindowEnd = ago(1d); let DetectionWindow = ago(1h); // Look at activity in the last hour // Calculate the baseline: average distinct external recipients per user let BaselineShareActivity = OfficeActivity | where TimeGenerated between (BaselineWindowStart .. BaselineWindowEnd) // Filter for baseline period | where Operation == "SharingSet" and UserType == "Member" // Focus on internal users sharing | summarize dcount_BaselineTargetUser = dcount(TargetUser) by UserPrincipalName; // Count distinct external users shared with // Analyze recent sharing activity OfficeActivity | where Operation == "SharingSet" and TimeGenerated > DetectionWindow // Filter for recent activity | summarize Current_dcount_TargetUser = dcount(TargetUser) by UserPrincipalName // Count distinct recipients in the detection window | join kind=inner BaselineShareActivity on UserPrincipalName // Combine recent activity with baseline data for each user // Alert if current sharing is, for example, more than 3x the baseline distinct count // AND the current count is above a minimum threshold (e.g., > 5) to avoid alerts for low-volume changes (1 vs 3) | where Current_dcount_TargetUser > (dcount_BaselineTargetUser * 3) and Current_dcount_TargetUser > 5
- How it Reduces False Positives: This rule doesn't just trigger on any external sharing. It specifically looks for a significant deviation (3x increase and more than 5 new recipients) from the user's own established sharing patterns. A user who regularly shares with many external partners won't trigger this, but someone who rarely does and suddenly shares widely will.
Login from New Geolocation Followed by Sensitive Data Operations
- Scenario: A user logs in from a country they've never logged in from before, and shortly thereafter, performs sensitive actions like downloading multiple files.
- Tool: Splunk
index=signin sourcetype=AzureAD // Start with Azure AD sign-in events | lookup previously_seen_geo_for_user UserPrincipalName OUTPUTNEW IsNewCountry // Lookup if this country is new for the user (assumes a regularly updated lookup table) | where IsNewCountry = "true" // Filter for logins from new countries | fields UserPrincipalName, SourceIP, Country, TimeGenerated // Keep relevant fields // Join with O365/SharePoint audit logs for sensitive operations by the same user within a short time window (e.g., 30 mins) | join UserPrincipalName [ search index=o365 sourcetype=SharePoint AuditLogs Operation IN ("FileDownloaded", "FileCopied", "FileSensitivityLabeled") | fields UserPrincipalName, Operation, TimeGenerated_SharePoint ] // Ensure SharePoint activity happened after the new geo login, within a reasonable timeframe | where TimeGenerated_SharePoint >= TimeGenerated AND TimeGenerated_SharePoint <= (TimeGenerated + 1800) // 1800 seconds = 30 minutes | stats count by UserPrincipalName, Country, Operation // Count the combined suspicious activities | where count > 5 // Alert if more than 5 combined instances occur (e.g., 5 downloads after new geo login)
- How it Reduces False Positives: This rule uses event correlation. A login from a new country might be legitimate (user is traveling). Downloading files might also be normal. However, the combination of a login from a new location immediately followed by a high volume of file downloads significantly increases the likelihood of suspicious activity, reducing alerts from isolated, less risky events.
Input Event Stream 1 (Entra ID Sign-in) | Input Event Stream 2 (SharePoint Audit) | Correlation Logic | Flagged Output (Alert) |
User A logs in from NewCountry | User A downloads 6 files within 30 mins | NewCountry=True AND (FileDownloaded > 5) | Alert: User A, NewCountry login, 6 files downloaded. |
User B logs in from KnownCountry | User B downloads 3 files | NewCountry=False OR (FileDownloaded <=5) | No Alert: Normal activity or insufficient correlated suspicious actions. |
User C logs in from NewCountry | User C performs no file operations | NewCountry=True AND (No relevant SharePoint activity) | No Alert (for this specific rule): New geo login alone may be tracked but not alerted. |
Anomalous Email Sending Rate (with Whitelisting)
- Scenario: A user account starts sending an unusually high volume of emails to many unique recipients, potentially indicating a compromised account used for spam/phishing.
- Tool: QRadar (AQL-like pseudo-syntax)
// Define thresholds and the temporal window // WHEN count of emails sent by a user (total_email_count) is greater than 100 // AND average unique recipients per hour (avg_unique_recipients_per_hour) is greater than 10 // AND this behavior is observed FOR a duration of 1 HOUR // AND the user is NOT IN the pre-defined 'authorized_bulk_senders_whitelist' group // THEN RAISE an offense with severity = “Medium” SELECT username, COUNT(*) as total_emails, UNIQUECOUNT(recipient_email) as unique_recipients FROM events WHERE LOGSOURCETYPENAME(logsourceid) = 'SMTP Logs' // Or your specific email log source GROUP BY username HAVING total_emails > 100 AND unique_recipients > 10 // Initial aggregation check START '1 hour ago' // Temporal window check, QRadar handles this with rule time windows // The rule engine would then check if this specific user is on the 'authorized_bulk_senders_whitelist' // And if not, trigger the offense.
- How it Reduces False Positives: This rule combines a smart threshold (high volume and many unique recipients sustained over an hour) with an exception list (whitelist). This ensures that legitimate bulk senders (e.g., marketing automation accounts) don't trigger alerts, focusing only on genuinely anomalous sending patterns from regular user accounts.
The relationship between individual events and these aggregated rules within your SIEM "Rule Constitution" can be seen as:
flowchart TB
subgraph Event Sources
direction LR
LOGINS["Login Events <br> (e.g., Azure AD, VPN)"]
FILE_ACCESS["File Access Events <br> (e.g., SharePoint, Windows File Servers)"]
EMAIL_SENT["Email Sending Events <br> (e.g., Exchange, O365)"]
OTHER_LOGS[Other System Logs]
end
subgraph SIEM Rule Constitution
direction TB
RC1["Rule 1: <br> New Geo Login + <br> Sensitive File Access"]
RC2["Rule 2: <br> Anomalous Email Volume <br> (Excluding Whitelist)"]
RC3["Rule 3: <br> Sudden High-Volume <br> External Sharing"]
end
ALERT_SYS[Alerting System]
LOGINS --> RC1
FILE_ACCESS --> RC1
EMAIL_SENT --> RC2
FILE_ACCESS-- "Assuming OfficeActivity for sharing includes file/document context" -->RC3
RC1 --> ALERT_SYS
RC2 --> ALERT_SYS
RC3 --> ALERT_SYS
This diagram shows how different event streams feed into specific, complex rules within the SIEM "Rule Constitution", leading to more qualified alerts.
🤖 The Role of UBA/UEBA in This Ecosystem
User Behavior Analytics (UBA) and User and Entity Behavior Analytics (UEBA) systems take this a step further. They leverage machine learning to automatically baseline "normal" behavior for users and entities (like hosts or applications) and then detect deviations.
Key Advantages:
- Dynamic Risk Scoring: Instead of binary alerts, UBA/UEBA often assigns a risk score to users/entities. Alerts trigger when this score crosses a critical threshold, often after multiple minor deviations contribute to a rising score.
- Learning from Feedback: SOC analysts can provide feedback on alerts (e.g., "this was a false positive," "this was a true positive"), allowing the system to learn and refine its models. (Think of systems like Exabeam, Securonix, or Microsoft Sentinel's UEBA capabilities).
- Peer Group Analysis: Behavior is compared not just to an individual's own baseline but also to that of their peers (e.g., other users in the same department or role).
Challenges:
- Initial Noise & Tuning: ML-based systems still require tuning. They can be noisy initially, especially in dynamic environments, and might generate false positives until properly calibrated.
- Environmental Adaptation: Effective UBA/UEBA requires good data hygiene and careful configuration to understand the specific context of your organization.
- Resource Intensive: Building and maintaining detailed behavioral baselines for every user and entity can be demanding on storage and compute resources.
🛠️ Building Effective Baselines: Tackling Storage and Compute Hurdles
The challenges of resource intensity for baselining are real, but not insurmountable. Here are effective approaches to build and manage behavioral baselines efficiently:
Leveraging Built-in UEBA Capabilities (e.g., Microsoft Sentinel):
- Modern SIEMs like Microsoft Sentinel come with powerful, integrated UEBA features. Sentinel, for instance, automatically collects and analyzes data from a multitude of sources including Azure AD sign-in logs, audit logs from Exchange, SharePoint, Teams, and other Microsoft services, as well as various third-party data connectors.
- It applies machine learning algorithms to this data to build behavioral baselines for users and entities. These algorithms are designed to identify anomalies and suspicious activities that deviate from established norms, helping to pinpoint insider threats, compromised accounts, and other sophisticated attacks with a higher degree of accuracy, thus reducing false positives.
Optimizing Log Storage for Cost and Performance:
- Log data is the bedrock of baselining, but storing everything indefinitely in a high-performance tier can be prohibitively expensive. Strategic storage optimization is key.
- Microsoft Sentinel, for example, utilizes Azure Monitor Logs, which offers different data plans to balance cost and accessibility:
- Analytics Logs (Interactive Tier): This is for your critical, high-value data that needs to be readily available for real-time querying, threat hunting, and alerting. It offers full KQL query capabilities and faster performance.
- Basic Logs: Designed for high-volume, verbose logs that are less frequently queried for real-time security operations. This tier has a lower ingestion cost but comes with limitations on query capabilities and retention (data is typically queryable for a shorter period like 8 days, though it can be archived).
- Archived Logs: For long-term retention (e.g., up to 7 or 12 years) of data primarily needed for compliance or occasional historical investigations. Data in archive needs to be restored to an interactive tier before it can be queried.
- By strategically assigning log types to these tiers (e.g., critical security logs to Analytics, verbose operational logs to Basic, and older data to Archive), organizations can significantly reduce storage costs while ensuring necessary data is available for effective baselining and investigation. This "Auxiliary Logs" concept (grouping Basic and Archived data) allows for smarter data lifecycle management.
Prioritizing Critical Users and Assets:
- Instead of attempting to build detailed baselines for every single user and entity in a large organization from day one, adopt a phased or tiered approach.
- Focus initial baselining efforts on high-value targets:
- Accounts with privileged access (e.g., domain administrators, cloud administrators).
- Executive accounts.
- Users with access to highly sensitive data (e.g., financial systems, PII databases).
- Critical infrastructure components.
- This prioritization significantly reduces the initial computational load and storage requirements, making the baselining process more manageable and delivering quicker wins for high-risk areas. As your program matures, you can gradually expand the scope.
Utilizing External Tools & Managed Services:
- For organizations looking to augment their capabilities or manage costs, external solutions and managed services can be highly beneficial.
- Services like CyberProof's Managed XDR (as an example of a managed security service provider) can offer expertise in efficiently collecting, parsing, storing, and analyzing log data. They often leverage their own optimized platforms and economies of scale, potentially reducing overall costs and freeing up your internal security team to focus on higher-level threat analysis and response. Such services can also assist in the ongoing tuning and maintenance of baselines and detection rules.
🔄 Summary: Making Baselining Achievable
While the challenges of storage, computation, and complexity in building behavioral baselines are undeniable, they are addressable. By leveraging the advanced UEBA capabilities within modern SIEMs like Microsoft Sentinel, strategically planning log storage and data tiering, prioritizing critical assets, and considering the support of specialized managed services, organizations can implement effective and economically viable baselining. This well-planned approach is fundamental for building a proactive defense capable of detecting and responding to sophisticated, evasive threats.
🧩 Synergy: Combining Rule Constitution with UBA/UEBA
The sweet spot? A blended approach. Use your well-defined SIEM "Rule Constitution" for known bad patterns and compliance requirements. Layer UBA/UEBA on top to catch the unknown unknowns and subtle deviations that hardcoded rules might miss.
For instance, a logical rule might flag a "login from a new country." A UBA system could then enrich this by noting that this user's risk score was already elevated due to unusual after-hours activity earlier in the week, adding crucial context for the SOC analyst.
💬 Let's Discuss & Share Knowledge!
Building an effective SIEM "Rule Constitution" is an ongoing journey, not a destination.
- Have you tried combining predefined logical rules with UBA/UEBA behavioral analytics in your SIEM? What were your results?
- How do you strike the balance between high sensitivity for threat detection and the practical need to minimize false positives for your SOC team?
- What are some of your most effective "high-fidelity" rules or processes that have significantly reduced false alert volumes in your environment?
- Are there specific SIEM query languages or features you've found particularly powerful for building these contextual rules?
I'm eager to hear your practical experiences and insights. Together, we can refine proactive, contextual, and sustainable detection models!
Subscribe to my newsletter
Read articles from Topaz Hurvitz directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
