Automating GKE Cluster Upgrade Notifications to Google Chat using Pub/Sub and Cloud Functions

Upgrading Kubernetes clusters is crucial for security, stability, and feature updates—but manually tracking upgrade availability or progress can be challenging for teams. In this post, I’ll show you how to automate GKE cluster upgrade notifications to Google Chat using a combination of Pub/Sub Topics and Cloud Functions in Google Cloud Platform (GCP). This setup ensures DevOps and platform teams never miss vital cluster upgrade alerts and can react quickly.

What is Google Cloud Pub/Sub?

Google Cloud Pub/Sub is a scalable, managed messaging service that enables asynchronous communication between independent applications. Its core entities are:

  • Topics: Named resources producers (publishers) send messages to.

  • Subscriptions: Endpoints subscribers use to receive copies of all topic messages. Subscriptions can be:

    • Pull-based (subscriber polls for messages)

    • Push-based (Pub/Sub delivers directly to an HTTPS endpoint)

Pub/Sub topics act as a decoupled channel, meaning publishers don’t need to know about or wait for subscribers—messages are held until each active subscription receives them or until a retention period expires.

Architecture Overview

In organizations where each team manages multiple GCP projects, it’s helpful to consolidate all GKE cluster upgrade notifications from a team’s projects into a single Google Chat space. This allows every team to monitor upgrade events for all clusters they own without missing any critical alerts, and keeps communications organized by team.

Components setup

  1. Pub/Sub Topic Creation: We create a Pub/Sub topic in each project with the GKE cluster.

  2. Centralized Cloud Function: In a central project (e.g., prod), a Cloud Function processes notifications from various topics.

  3. Push-Based Subscriptions: Push-based subscriptions are created for each topic in the projects, triggering the Cloud Function.

  4. Google Chat Integration: Notifications are sent to different Google Chat groups for each team.

Deployment Steps

  1. Create a Google Chat Space: Set up a Google Chat space and create a webhook for it.

  2. Service Account Setup: Create a Service Account in the project with the GKE Cluster and assign it the Pub/Sub Service Agent IAM Role.

     PROJECT_ID=<project-id>
     SERVICE_ACCOUNT=<service-account>
     gcloud projects add-iam-policy-binding $PROJECT_ID \
     --member=serviceAccount:$SERVICE_ACCOUNT@<$PROJECT_ID.iam.gserviceaccount.com \
     --role=roles/pubsub.serviceAgent
    
  3. Create a Pub/Sub Topic:

     TOPIC_ID=<universe>-gke-cluster-notifications
     PROJECT_ID=<project_id>
     gcloud pubsub topics create $TOPIC_ID --project $PROJECT_ID
    
  4. Enable GKE Notifications: Enable notifications in the GKE Cluster and filter for UpgradeEvent and UpgradeAvailableEvent.

  5. Cloud Function Creation: Create a Cloud Function in the central project to process notifications. Enable authentication for the function.

  6. Create a Pub/Sub Subscription: Configure a push-based subscription with the following settings:

     TOPIC_NAME=<topic-name>
     CLOUD_FUNCTION_ENDPOINT=<http-endpoint-of-cloud-function>
     SERVICE_ACCOUNT=<sa-created-in-step-1>
     TOPIC_PROJECT=<project-name>
     gcloud pubsub subscriptions create <universe>-gke-notifications \
     --topic=$TOPIC_NAME --topic-project=$TOPIC_PROJECT \
     --push-auth-service-account=$SERVICE_ACCOUNT \
     --push-endpoint=$CLOUD_FUNCTION_ENDPOINT \
     --message-retention-duration=0 --expiration-period=never --ack-deadline=120 \
     --min-retry-delay=120 --max-retry-delay=600
    
  7. Assign Cloud Function Invoker Permission: Grant the Service Account we created previously , Cloud Function Invoker Permission in the central project with the Cloud Function.

Cloud Function Code

The Cloud Function processes incoming Pub/Sub messages and sends notifications to Google Chat. Here's a snippet of the function:

import functions_framework
import logging
import flask
import json
import requests
import base64
from datetime import datetime, timezone, timedelta

# Mapping teams to GChat webhook URLs
TEAM_TO_WEBHOOK = {
    "<team1>": "<https://chat.googleapis.com...>",
    "<team2>": "<https://chat.googleapis.com...>"
}

# Mapping project IDs to their respective teams
TEAM_PROJECT_MAPPING = {
    "Team1": ["<project-number1>", "<project-number2>"]
}

@functions_framework.http
def http(request: flask.Request):
    logging.basicConfig()
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.DEBUG)

    # Parse incoming Pub/Sub message
    try:
        message_body = request.get_data().decode()
        messages = json.loads(message_body)
        logger.debug(f"Received message: {messages}")
    except Exception as e:
        logger.error("Failed to parse Pub/Sub message", exc_info=True)
        return ("Invalid request format", 400)

    attributes = messages.get("message", {}).get("attributes", {})
    cluster_name = attributes.get("cluster_name")
    project_id = attributes.get("project_id")
    payload = json.loads(attributes.get("payload", "{}"))
    type_url = attributes.get("type_url", "").split(".")[-1]
    resource_type = payload.get("resourceType", "")
    publish_date = messages["message"].get("publish_time", "")

    if not project_id:
        logger.error("Missing 'project_id' in message attributes")
        return ("Missing project_id in attributes", 400)

    # Convert UTC time to IST
    try:
        utc_time = datetime.strptime(publish_date, "%Y-%m-%dT%H:%M:%S.%fZ").replace(tzinfo=timezone.utc)
        ist_time = utc_time.astimezone(timezone(offset=timedelta(hours=5, minutes=30))).strftime("%H:%M:%S - %d %B %Y")
    except Exception as e:
        logger.warning("Failed to parse publish_time", exc_info=True)
        ist_time = "Unknown"

    # Determine event type
    app_message = None
    if type_url == "UpgradeEvent" and resource_type == "MASTER":
        event_title = "Cluster upgrade has started."
        version = payload.get("currentVersion", "Unknown")
        data = base64.b64decode(messages["message"]["data"]).decode("utf-8")
        app_message = {
            "cardsV2": [
                {
                    "card": {
                        "header": {
                            "title": f"Cluster name: {cluster_name}",
                            "subtitle": event_title
                        },
                        "sections": [
                            {
                                "header": data,
                                "collapsible": "false",
                                "widgets": [
                                    {
                                        "decoratedText": {
                                            "text": f"Previous version: {version}"
                                        }
                                    },
                                    {
                                        "textParagraph": {
                                            "text": f"Date: {ist_time} IST"
                                        }
                                    }
                                ]
                            }
                        ]
                    }
                }
            ]
        }
    elif type_url == "UpgradeAvailableEvent" and resource_type == "MASTER":
        event_title = "Cluster upgrade is available."
        version = payload.get("version", "Unknown")
        data = base64.b64decode(messages["message"]["data"]).decode("utf-8")
        app_message = {
            "cardsV2": [
                {
                    "card": {
                        "header": {
                            "title": f"Cluster name: {cluster_name}",
                            "subtitle": event_title
                        },
                        "sections": [
                            {
                                "header": data,
                                "collapsible": "false",
                                "widgets": [
                                    {
                                        "textParagraph": {
                                            "text": f"Date: {ist_time} IST"
                                        }
                                    }
                                ]
                            }
                        ]
                    }
                }
            ]
        }
    else:
        logger.info(f"Ignored message of type '{type_url}' and resource '{resource_type}'")
        return "Ignored", 200

    # Determine which team owns the project
    team_name = None
    for team, project_ids in TEAM_PROJECT_MAPPING.items():
        if project_id in project_ids:
            team_name = team
            break

    if not team_name:
        logger.error(f"No team mapping found for project_id '{project_id}'")
        return ("No team mapping found for this project", 404)

    webhook_url = TEAM_TO_WEBHOOK.get(team_name)
    if not webhook_url:
        logger.error(f"No webhook found for team '{team_name}'")
        return ("No webhook found for this team", 404)

    # Send message to Google Chat
    headers = {
        "Content-Type": "application/json; charset=UTF-8"
    }
    try:
        response = requests.post(url=webhook_url, data=json.dumps(app_message), headers=headers)
        logger.info(f"Notification sent successfully for cluster: {cluster_name}, team: {team_name}")
        return str(response.status_code), 200
    except Exception as e:
        logger.error("Failed to send message to Google Chat", exc_info=True)
        return "Failed to send message", 500

Requirements

Ensure the following Python packages are included in your requirements.txt file:

blinker==1.7.0
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
cloudevents==1.10.1
deprecation==2.1.0
Flask==3.0.3
functions-framework==3.5.0
gunicorn==22.0.0
idna==3.7
itsdangerous==2.2.0
Jinja2==3.1.3
MarkupSafe==2.1.5
packaging==24.0
pyparsing==3.1.2
requests==2.31.0
urllib3==2.2.1
watchdog==4.0.0
Werkzeug==3.0.2

Conclusion

By following these steps, you can effectively set up a notification system for GKE cluster upgrades using Google Cloud Pub/Sub and Google Chat. This setup ensures that your teams are promptly informed about cluster upgrade events, allowing for better management and response to changes in your Kubernetes environment.

1
Subscribe to my newsletter

Read articles from Vaishnavi Nazare directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vaishnavi Nazare
Vaishnavi Nazare

Passionate DevOps learner with a knack for automating workflows and engineering future-ready solutions. I enjoy exploring cloud platforms, CI/CD pipelines, and infrastructure as code to bridge the gap between development and operations. On a mission to transform manual processes into seamless, reliable deployments—one script at a time. Always curious, always building: Automating Today, Engineering Tomorrow.