Simplifying New Relic NRQL Alert Management: Using Policy ID for Bulk

Introduction

In our previous blog post, we discussed how to automate the process of enabling and disabling New Relic alerts using a Jenkins job. Today, we're taking this concept a step further by addressing a common challenge faced by DevOps teams managing complex alert systems, specifically focusing on NRQL (New Relic Query Language) alert conditions.

The Problem

When dealing with New Relic alerts, a single policy may contain multiple alert conditions of various types. For this solution, we're focusing specifically on NRQL Query alert conditions. Managing these at scale can be challenging, especially when you need to enable or disable multiple NRQL alerts simultaneously. Previously, you might have had to provide details such as the query, condition ID, and threshold for each NRQL condition individually, resulting in lengthy and complex code.

The Solution: Using Policy ID for Bulk Operations on NRQL Alerts

To simplify this process, we've developed a solution that allows you to use the policy ID to enable or disable all the NRQL alert conditions within a policy in a single build. This approach significantly reduces code complexity and makes NRQL alert management more efficient.

How It Works

Our solution leverages the New Relic API and uses a bash script to:

Fetch all NRQL alert conditions associated with a specific policy ID
Iterate through each NRQL condition
Enable or disable each NRQL condition based on a parameter passed to the Jenkins job

Let's break down the key components of this solution:

Jenkins Job Configuration

The Jenkins job is set up with a choice parameter named "ACTION" that allows you to specify whether you want to enable or disable the NRQL alerts.

The Script

Here's an overview of the script's main components:

API Key and Policy ID: We set these as variables at the beginning of the script.
Function to Update Condition: We define a function update_condition() that handles the API call to update each NRQL alert condition.
Fetching NRQL Alert Conditions: We use the New Relic API to fetch all NRQL alert conditions associated with the specified policy ID.
Iterating and Updating: We loop through each NRQL condition, extracting necessary information and calling our update function.

Key Code Snippet

# Add jq to PATH
export PATH=$PATH:/c/tools/jq

API_KEY="NRAK-MSJ-----------------------"
POLICY_ID=5648697

# Jenkins parameter for enabling or disabling conditions
ACTION="${ACTION:-disable}"  # Default action is 'disable'

# Function to update alert condition
update_condition() {
    local condition_id=$1
    local condition_name=$2
    local nrql_query=$3
    local terms=$4
    local action_status=$5  # "true" to enable, "false" to disable

    echo "Updating condition ID: ${condition_id} with name: ${condition_name} - Action: $action_status"

    # Prepare JSON payload including the nrql query and terms
    JSON_PAYLOAD=$(jq -n \
        --arg name "$condition_name" \
        --arg query "$nrql_query" \
        --argjson terms "$terms" \
        --argjson enabled "$action_status" \
        '{ nrql_condition: { name: $name, enabled: $enabled, nrql: { query: $query }, terms: $terms } }')

    # Update alert condition
    response=$(curl -s -X PUT "https://api.newrelic.com/v2/alerts_nrql_conditions/${condition_id}.json" \
        -H "X-Api-Key:${API_KEY}" \
        -H 'Content-Type: application/json' \
        -d "$JSON_PAYLOAD")

    if [[ $(echo "$response" | jq -r '.success') == "true" ]]; then
        echo "Successfully updated condition ID: ${condition_id}"
    else
        echo "Failed to update condition ID: ${condition_id}. Response: $response"
    fi
}

# Fetch alert conditions using the policy ID
CONDITIONS=$(curl -s -X GET "https://api.newrelic.com/v2/alerts_nrql_conditions.json?policy_id=${POLICY_ID}" -H "X-Api-Key:${API_KEY}" -H 'Content-Type: application/json')

# Extract condition data using jq
CONDITION_DATA=$(echo "${CONDITIONS}" | jq -c '.nrql_conditions[]')

# Action: enable or disable based on the Jenkins parameter
if [[ "$ACTION" == "enable" ]]; then
    action_status=true
else
    action_status=false
fi

# Update alert conditions
while IFS= read -r condition; do
    # Extract necessary fields
    condition_id=$(echo "$condition" | jq -r '.id')
    condition_name=$(echo "$condition" | jq -r '.name')
    nrql_query=$(echo "$condition" | jq -r '.nrql.query')
    terms=$(echo "$condition" | jq -r '.terms')

    # Check if terms is not empty
    if [ "$(echo "$terms" | jq -r 'length')" -gt 0 ]; then
        update_condition "$condition_id" "$condition_name" "$nrql_query" "$terms" "$action_status"
    else
        echo "Skipping condition ID: ${condition_id} with name: ${condition_name} because terms are empty."
    fi

done <<< "$CONDITION_DATA"

Benefits of This Approach

Simplicity: One Jenkins build can enable or disable all NRQL alerts within a policy.
Scalability: Easy to manage large numbers of NRQL alert conditions.
Flexibility: Can be easily adapted to perform other bulk operations on NRQL alert conditions.
Error Handling: The script includes checks to ensure only valid NRQL conditions are updated.

Limitations and Considerations

It's important to note that this solution is specifically designed for NRQL Query alert types. If your New Relic policy contains other types of alert conditions (e.g., APM, browser, mobile, etc.), this script will not affect those. You will need to modify the script to handle different condition types if you want to manage all alert types within a policy.

Conclusion

By leveraging the policy ID for bulk operations on NRQL alerts, we've significantly simplified the process of managing these specific New Relic alert conditions. This approach saves time, reduces the chance of errors, and makes it easier to maintain consistent alert states across your NRQL-based monitoring setup.

In your DevOps journey, always look for opportunities to simplify and automate repetitive tasks, but also be aware of the specific scope and limitations of your automation solutions.

Happy monitoring!

Simplifying New Relic NRQL Alert Management: Using Policy ID for Bulk Operations