Deploy a Kubernetes cluster on GKE

Aditya KhadangaAditya Khadanga
6 min read

Google Kubernetes Engine (GKE) is a powerful, managed Kubernetes service that allows you to deploy containerized applications at scale. In this blog, we’ll walk through how to provision a GKE cluster using Terraform, an infrastructure-as-code (IaC) tool that helps automate the entire process.

Whether you're a DevOps engineer, cloud enthusiast, or just starting out, this guide will give you hands-on knowledge of deploying production-ready infrastructure on GCP.

Deploying a GKE cluster using Terraform is a clean and reusable way to manage infra as code.

Here's a full step-by-step guide using Terraform. ( we will perform everything is Google Cloud Shell)

βœ… Prerequisites

  1. Google Cloud account with billing enabled

  2. Terraform installed ( install it in Google cloud shell)

  3. gcloud CLI installed & configured. and Enable the APIs ( Compute Engine, Service usage, Kubernetes)

     gcloud version #command to check gcloud is installed or not
    
     #if it is installed you will see this type of output
     Google Cloud SDK 456.0.0
     bq 2.0.84
     core 2023.04.01
     gcloud 2023.04.01
     kubectl 1.28.1
    
     #if you get command not found, it’s not installed.
     #Install it from: https://cloud.google.com/sdk/docs/install
    
     gcloud auth login
     gcloud auth list
     gcloud config list
     gcloud config set project <your-project-id>
    
     # you should see output something like the below one
     [core]
     account = your.email@gmail.com
     project = your-gcp-project-id
    

    If the project isn't set, run:

     gcloud config set project <your-gcp-project-id>
    
  4. A service account with permissions (Kubernetes Engine Admin, Compute Admin, etc.)

    Create a Service Account:

     gcloud iam service-accounts create terraform-gke \
       --description="Service account for Terraform to manage GKE" \
       --display-name="Terraform GKE Admin"
    
     # Below code is to Grant IAM Roles (<your-project-id> replace it with your project id)
    
     gcloud projects add-iam-policy-binding <your-project-id> \
       --member="serviceAccount:terraform-gke@<your-project-id>.iam.gserviceaccount.com" \
       --role="roles/container.admin"
    
     gcloud projects add-iam-policy-binding <your-project-id> \
       --member="serviceAccount:terraform-gke@<your-project-id>.iam.gserviceaccount.com" \
       --role="roles/compute.admin"
    
     gcloud projects add-iam-policy-binding <your-project-id> \
       --member="serviceAccount:terraform-gke@<your-project-id>.iam.gserviceaccount.com" \
       --role="roles/iam.serviceAccountUser"
    

πŸ“Œ Above roles allow Terraform to:

  • Create/manage GKE clusters

  • Manage networking and compute

  • Use the service account to impersonate itself

Create and Download a JSON Key:

gcloud iam service-accounts keys create ~/terraform-gke-key.json \
  --iam-account terraform-gke@<your-project-id>.iam.gserviceaccount.com

This JSON file is your service account key β€” keep it safe! πŸ”

Export Credentials for Terraform

Before running Terraform:

export GOOGLE_APPLICATION_CREDENTIALS=~/terraform-gke-key.json

πŸ—‚οΈ Project Structure

mkdir gke-cluster && cd gke-cluster
touch main.tf variables.tf outputs.tf terraform.tfvars
gke-cluster/   #folder 
β”œβ”€β”€ main.tf
β”œβ”€β”€ variables.tf
β”œβ”€β”€ outputs.tf
β”œβ”€β”€ terraform.tfvars

πŸ“„ main.tf

provider "google" {
  project = var.project_id
  region  = var.region   # 
  region  = var.zone     # use this if you face quota issue
}

resource "google_container_cluster" "primary" {
  name     = var.cluster_name
  location = var.region
  location = var.zone # use this if you face quota issue

  remove_default_node_pool = true
  initial_node_count       = 1

  networking_mode = "VPC_NATIVE"
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "primary-node-pool"
  location   = var.region
  location   = var.zone # use this if you face quota issue
  cluster    = google_container_cluster.primary.name

  node_config {
    machine_type = "e2-medium"
    disk_size_gb  = 50               # πŸ‘ˆ Add this line to reduce disk usage
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform",
    ]
  }

  initial_node_count = 1
}

🧠 Why This Works

  • GCP by default allocates 100 GB SSD per node

  • You didn’t override that, so you're getting the default quota usage

  • Even with one node, if your cluster is regional, GKE might try to deploy across three zones within the region. So:

    1 node pool Γ— 1 node Γ— 3 zones Γ— 100 GB = 300 GB

    That's why you're seeing a request for 300 GB even if you're provisioning just one node.

    Fix: Deploy in a Zonal Cluster Instead of Regional : which is

    location = var.region »»» location = var.region

    ( go and check in the error sectionfor more details)

πŸ“„ variables.tf

variable "project_id" {
  type = string
}

variable "region" {
  type    = string
  default = "ap-south1"
}

variable "zone" {
  type    = string
  default = "ap-south1-a"
}

variable "cluster_name" {
  type    = string
  default = "my-gke-cluster"
}

πŸ“„ terraform.tfvars

hclCopyEditproject_id   = "your-gcp-project-id"
region       = "ap-south1"
zone         = "ap-south1-a"
cluster_name = "my-gke-cluster"

πŸ“„ outputs.tf

hclCopyEditoutput "cluster_name" {
  value = google_container_cluster.primary.name
}

output "kubeconfig_command" {
  value = "gcloud container clusters get-credentials ${google_container_cluster.primary.name} --region ${var.region}"
}

πŸš€ Steps to Deploy

βœ… 1. Authenticate

Make sure you're authenticated with GCP and have set the project:

gcloud auth application-default login

βœ… 2. Initialize & Apply Terraform

terraform init
terraform plan
terraform apply -auto-approve

βœ… 3. Configure kubectl

After deployment finishes, run the output command:

gcloud container clusters get-credentials my-gke-cluster --region ap-south1
kubectl get nodes

Boom β€” you're connected to your GKE cluster! πŸŽ‰


βœ… 4. Deploy Something

kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --type=LoadBalancer --port=80
kubectl get svc

🧹 To Destroy:

terraform destroy -auto-approve

Error you May Face

ERROR: (gcloud.container.clusters.list) ResponseError: code=403, message=Kubernetes Engine API has not been used in project before or it is disabled.

Means: The service account has the right permissions, but the GKE API is still disabled, so it can't do anything

βœ… How to Fix It

  1. Go to this link in your browser (from the error message):
    πŸ‘‰ Enable Kubernetes Engine API

  2. Click "Enable" (top of the page).

  3. Wait 1–2 minutes for it to fully activate.

Error 403: Insufficient regional quota to satisfy request: resource "SSD_TOTAL_GB": request requires '300.0' and is short '50.0'. project has a quota of '250.0' with '250.0' available.

Means: GKE cluster creation is requesting 300 GB of SSD disk, but your project only has a quota of 250 GB in that region.

βœ… How to Fix It

  1. go to main.tf

  2. You can manually control the disk size by adding disk_size_gb to the node_config block in your google_container_node_pool resource.

    disk_size_gb = 50 # πŸ‘ˆ Add this line to reduce disk usage

Error: Cannot destroy cluster because deletion_protection is set to true. Set it to false to proceed with cluster deletion.

Means: This means your GKE cluster has deletion protection enabled, which prevents Terraform (or anyone) from accidentally deleting the cluster.

βœ… How to Fix It

Solution: Disable deletion_protection in Terraform

Just update your Terraform config for the cluster to explicitly disable deletion protection.

πŸ”§ In google_container_cluster block, add:

resource "google_container_cluster" "primary" {
  name     = var.cluster_name
  location = var.zone

  remove_default_node_pool = true
  initial_node_count       = 1

  networking_mode = "VPC_NATIVE"
  deletion_protection = false  # πŸ‘ˆ Add this line
}

Then do:

terraform apply

This will update the cluster and turn off deletion protection.

Steps to Access the nginx App (LoadBalancer Service)

πŸ” 1. Get the External IP

Run:

kubectl get svc

You should see something like:

NAME         TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
nginx        LoadBalancer   10.0.12.45     <pending>        80:32345/TCP   1m

⏳ If the EXTERNAL-IP is still <pending>, it means GCP is still provisioning the external load balancer (can take 1–3 minutes).

βœ… 2. Once EXTERNAL-IP Is Ready

Example:

NAME         TYPE           CLUSTER-IP     EXTERNAL-IP       PORT(S)        AGE
nginx        LoadBalancer   10.0.12.45     34.133.45.23       80:32345/TCP   2m

Now you can access your NGINX app in the browser:

➑️ http://34.133.45.23

βœ… Final Thoughts

Using Terraform to deploy GKE clusters allows you to manage Kubernetes infrastructure declaratively, reproducibly, and at scale. With just a few configuration files, you can spin up or destroy entire clusters, which is especially powerful for CI/CD pipelines and infrastructure automation.

0
Subscribe to my newsletter

Read articles from Aditya Khadanga directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aditya Khadanga
Aditya Khadanga

A DevOps practitioner dedicated to sharing practical knowledge. Expect in-depth tutorials and clear explanations of DevOps concepts, from fundamentals to advanced techniques. Join me on this journey of continuous learning and improvement!