Simplifying New Relic NRQL Alert Management: Using Policy ID for Bulk Operations
Introduction
In our previous blog post, we discussed how to automate the process of enabling and disabling New Relic alerts using a Jenkins job. Today, we're taking this concept a step further by addressing a common challenge faced by DevOps teams managing complex alert systems, specifically focusing on NRQL (New Relic Query Language) alert conditions.
The Problem
When dealing with New Relic alerts, a single policy may contain multiple alert conditions of various types. For this solution, we're focusing specifically on NRQL Query alert conditions. Managing these at scale can be challenging, especially when you need to enable or disable multiple NRQL alerts simultaneously. Previously, you might have had to provide details such as the query, condition ID, and threshold for each NRQL condition individually, resulting in lengthy and complex code.
The Solution: Using Policy ID for Bulk Operations on NRQL Alerts
To simplify this process, we've developed a solution that allows you to use the policy ID to enable or disable all the NRQL alert conditions within a policy in a single build. This approach significantly reduces code complexity and makes NRQL alert management more efficient.
How It Works
Our solution leverages the New Relic API and uses a bash script to:
Fetch all NRQL alert conditions associated with a specific policy ID
Iterate through each NRQL condition
Enable or disable each NRQL condition based on a parameter passed to the Jenkins job
Let's break down the key components of this solution:
Jenkins Job Configuration
The Jenkins job is set up with a choice parameter named "ACTION" that allows you to specify whether you want to enable or disable the NRQL alerts.
The Script
Here's an overview of the script's main components:
API Key and Policy ID: We set these as variables at the beginning of the script.
Function to Update Condition: We define a function
update_condition()
that handles the API call to update each NRQL alert condition.Fetching NRQL Alert Conditions: We use the New Relic API to fetch all NRQL alert conditions associated with the specified policy ID.
Iterating and Updating: We loop through each NRQL condition, extracting necessary information and calling our update function.
Key Code Snippet
# Add jq to PATH
export PATH=$PATH:/c/tools/jq
API_KEY="NRAK-MSJ-----------------------"
POLICY_ID=5648697
# Jenkins parameter for enabling or disabling conditions
ACTION="${ACTION:-disable}" # Default action is 'disable'
# Function to update alert condition
update_condition() {
local condition_id=$1
local condition_name=$2
local nrql_query=$3
local terms=$4
local action_status=$5 # "true" to enable, "false" to disable
echo "Updating condition ID: ${condition_id} with name: ${condition_name} - Action: $action_status"
# Prepare JSON payload including the nrql query and terms
JSON_PAYLOAD=$(jq -n \
--arg name "$condition_name" \
--arg query "$nrql_query" \
--argjson terms "$terms" \
--argjson enabled "$action_status" \
'{ nrql_condition: { name: $name, enabled: $enabled, nrql: { query: $query }, terms: $terms } }')
# Update alert condition
response=$(curl -s -X PUT "https://api.newrelic.com/v2/alerts_nrql_conditions/${condition_id}.json" \
-H "X-Api-Key:${API_KEY}" \
-H 'Content-Type: application/json' \
-d "$JSON_PAYLOAD")
if [[ $(echo "$response" | jq -r '.success') == "true" ]]; then
echo "Successfully updated condition ID: ${condition_id}"
else
echo "Failed to update condition ID: ${condition_id}. Response: $response"
fi
}
# Fetch alert conditions using the policy ID
CONDITIONS=$(curl -s -X GET "https://api.newrelic.com/v2/alerts_nrql_conditions.json?policy_id=${POLICY_ID}" -H "X-Api-Key:${API_KEY}" -H 'Content-Type: application/json')
# Extract condition data using jq
CONDITION_DATA=$(echo "${CONDITIONS}" | jq -c '.nrql_conditions[]')
# Action: enable or disable based on the Jenkins parameter
if [[ "$ACTION" == "enable" ]]; then
action_status=true
else
action_status=false
fi
# Update alert conditions
while IFS= read -r condition; do
# Extract necessary fields
condition_id=$(echo "$condition" | jq -r '.id')
condition_name=$(echo "$condition" | jq -r '.name')
nrql_query=$(echo "$condition" | jq -r '.nrql.query')
terms=$(echo "$condition" | jq -r '.terms')
# Check if terms is not empty
if [ "$(echo "$terms" | jq -r 'length')" -gt 0 ]; then
update_condition "$condition_id" "$condition_name" "$nrql_query" "$terms" "$action_status"
else
echo "Skipping condition ID: ${condition_id} with name: ${condition_name} because terms are empty."
fi
done <<< "$CONDITION_DATA"
Benefits of This Approach
Simplicity: One Jenkins build can enable or disable all NRQL alerts within a policy.
Scalability: Easy to manage large numbers of NRQL alert conditions.
Flexibility: Can be easily adapted to perform other bulk operations on NRQL alert conditions.
Error Handling: The script includes checks to ensure only valid NRQL conditions are updated.
Limitations and Considerations
It's important to note that this solution is specifically designed for NRQL Query alert types. If your New Relic policy contains other types of alert conditions (e.g., APM, browser, mobile, etc.), this script will not affect those. You will need to modify the script to handle different condition types if you want to manage all alert types within a policy.
Conclusion
By leveraging the policy ID for bulk operations on NRQL alerts, we've significantly simplified the process of managing these specific New Relic alert conditions. This approach saves time, reduces the chance of errors, and makes it easier to maintain consistent alert states across your NRQL-based monitoring setup.
In your DevOps journey, always look for opportunities to simplify and automate repetitive tasks, but also be aware of the specific scope and limitations of your automation solutions.
Happy monitoring!
Subscribe to my newsletter
Read articles from GANESH KOPPULA directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by