GCP Instance Scheduler using Terraform
Managing cloud resources efficiently is crucial for optimizing costs and ensuring that resources are only utilized when needed. One common requirement is to automatically shut down virtual machines (VMs) during non-working hours to save costs. In this blog post, we will walk through how to set up a Google Cloud Platform (GCP) Instance Scheduler using Terraform, which will automatically turn off labeled VMs at a specified time every day.
Introduction
Google Cloud Scheduler, Pub/Sub, and Cloud Functions can be orchestrated together to create an automated instance scheduler. With Terraform, this setup becomes manageable and reproducible. Here, we will define Terraform configurations to:
Create a Pub/Sub topic.
Set up a Cloud Scheduler job.
Deploy a Cloud Function to stop VMs based on labels.
Manage IAM roles and permissions.
Terraform Configuration
Let's break down the Terraform files used in this setup:
1. Variables Definition (variables.tf
)
variable "gcp_project" {
default = "your gcp project_id here"
}
variable "cron_pattern" {
default = "59 23 * * *" # set to every day, at 23:59
}
variable "scheduler_function_bucket" {
default = "your bucket name here"
}
variable "label_key" {
default = "instance-scheduler"
}
variable "label_value" {
default = "enabled"
}
In the variables.tf
file, we define variables to hold values for the GCP project ID, the cron pattern for scheduling, the name of the storage bucket, and the labels used to identify the VMs to be managed.
2. Provider Configuration (provider.tf
)
provider "google" {
project = var.gcp_project
region = "us-central1"
}
This configures the Google provider with the specified project ID and region.
3. Main Terraform Configuration (main.tf
)
Pub/Sub Topic
resource "google_pubsub_topic" "topic" {
name = "instance-scheduler-topic"
}
Creates a Pub/Sub topic which the Cloud Scheduler job will publish messages to.
Cloud Scheduler Job
resource "google_cloud_scheduler_job" "cr_job" {
name = "instance-scheduler"
description = "Cloud Scheduler to turn off labeled VMs."
schedule = var.cron_pattern
pubsub_target {
topic_name = google_pubsub_topic.topic.id
data = base64encode("foo, bar..")
}
}
Sets up a Cloud Scheduler job that triggers according to the cron pattern defined, publishing a message to the Pub/Sub topic.
Storage Bucket for Cloud Function
resource "google_storage_bucket" "bucket" {
name = var.scheduler_function_bucket
}
resource "google_storage_bucket_object" "archive" {
name = "function.zip"
bucket = google_storage_bucket.bucket.name
source = "gcp_function/function.zip"
}
Defines a Google Cloud Storage bucket and uploads the Cloud Function code as a ZIP file.
Service Account and IAM Roles
resource "google_service_account" "svc_acc" {
account_id = "instance-scheduler-svc-acc"
display_name = "instance-scheduler-svc-acc"
}
resource "google_project_iam_custom_role" "svc_acc_custom_role" {
role_id = "instance.scheduler"
title = "Instance Scheduler Role"
description = "Ability to turn off instances with a specific label."
permissions = [
"compute.instances.list",
"compute.instances.stop",
"compute.zones.list",
]
}
resource "google_project_iam_member" "svc_acc_iam_member" {
project = var.gcp_project
role = "projects/${var.project}/roles/${google_project_iam_custom_role.svc_acc_custom_role.role_id}"
member = "serviceAccount:${google_service_account.svc_acc.email}"
depends_on = [
google_service_account.svc_acc
]
}
Creates a service account and assigns it a custom role with permissions to list and stop instances, and list zones.
Cloud Function
resource "google_cloudfunctions_function" "instance_scheduler_function" {
name = "instance-scheduler-function"
available_memory_mb = 128
source_archive_bucket = google_storage_bucket.bucket.name
source_archive_object = google_storage_bucket_object.archive.name
runtime = "python38"
description = "Cloud function to do the instance scheduling."
event_trigger {
event_type = "google.pubsub.topic.publish"
resource = google_pubsub_topic.topic.name
failure_policy {
retry = false
}
}
timeout = 180
entry_point = "instance_scheduler_start"
service_account_email = google_service_account.svc_acc.email
environment_variables = {
PROJECT = var.gcp_project
LABEL_KEY = var.label_key
LABEL_VALUE = var.label_value
}
depends_on = [
google_service_account.svc_acc
]
}
Deploys a Cloud Function that is triggered by messages published to the Pub/Sub topic. It uses the service account and stops instances based on the specified labels.
Python Cloud Function Code (main.py
)
The Cloud Function is implemented in Python to authenticate with the GCP API, list instances, and stop those that match the specified labels.
import base64
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
import os
API_VERSION = 'v1'
RESOURCE_TYPE = 'compute'
def authenticate():
try:
credentials = GoogleCredentials.get_application_default()
service = discovery.build(RESOURCE_TYPE, API_VERSION, credentials=credentials, cache_discovery=False)
return service
except Exception as error:
return error
def gather_zones(project, service):
try:
zones = service.zones().list(project=project).execute()
zone_list = [zone['name'] for zone in zones['items']]
return zone_list
except Exception as error:
return error
def turn_instance_off(project, service, instance, zone):
try:
service.instances().stop(project=project, zone=zone, instance=instance).execute()
print(f"Successfully turned off VM {instance} in project {project}, zone {zone}.")
except Exception as error:
return error
def locate_instances(project, service, zones, label_key, label_value):
try:
for zone in zones:
instances = service.instances().list(project=project, zone=zone, filter=f"labels.{label_key}={label_value}").execute()
if 'items' in instances:
for instance in instances['items']:
if instance['status'] == "RUNNING":
turn_instance_off(project, service, instance['name'], zone)
except Exception as error:
return error
def instance_scheduler_start(event, context):
project = os.environ.get('PROJECT')
label_key = os.environ.get('LABEL_KEY')
label_value = os.environ.get('LABEL_VALUE')
service = authenticate()
zones = gather_zones(project, service)
locate_instances(project, service, zones, label_key, label_value)
Explanation of the Python Cloud Function Code
Authentication: The
authenticate
function sets up authentication using default application credentials.Gather Zones: The
gather_zones
function retrieves a list of zones in the project.Turn Instance Off: The
turn_instance_off
function stops a VM instance.Locate Instances: The
locate_instances
function lists instances in each zone and stops those with the specified label.Entry Point: The
instance_scheduler_start
function is the entry point for the Cloud Function, retrieving environment variables and coordinating the process.
Conclusion
This Terraform setup, combined with the Cloud Function, creates a robust solution to automatically manage VM instances in GCP. By scheduling shutdowns of labeled instances, you can optimize your resource usage and reduce costs. The use of Terraform ensures that the configuration is version-controlled and easily reproducible.
For the full code and detailed implementation, you can visit the GCP Instance Scheduler using Terraform repository on GitHub. This repository contains all the Terraform configuration files and the Python Cloud Function code discussed in this blog post. You can clone the repository and customize it according to your needs to automate the scheduling of your GCP instances.
Subscribe to my newsletter
Read articles from Mikaeel Khalid directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Mikaeel Khalid
Mikaeel Khalid
I am a Software and Certified AWS & GCP Engineer, I am passionate about solving complex technical problems. My goal here is to help others by sharing my learning journey.