Automate AKS Clusters with Terraform, Service Principal, and Azure Key Vault


Why Automate AKS Clusters with Terraform?
Terraform is an Infrastructure as Code (IaC) tool that allows you to define and manage infrastructure through code. By using Terraform, you can automate the creation and management of Azure resources, ensuring consistency and reducing the risk of manual errors.
In modern DevOps practices, automation is key to efficiently managing and deploying infrastructure. One of the powerful combinations is using Terraform with Azure Kubernetes Service (AKS) to provision and manage Kubernetes clusters. This blog post will guide you through the process of automating the provisioning of an AKS cluster using Terraform and a service principal. We'll cover prerequisites, step-by-step instructions, and common issues you might encounter.
Additionally, we'll discuss the inclusion of Azure Monitoring for the AKS cluster, an essential component for maintaining the health and performance of your Kubernetes deployments.
Visualize Your Automated AKS Setup
Essential Tools and Setup for AKS Automation
Before we begin, ensure you have the following tools installed and configured:
Azure CLI: You need the Azure CLI to interact with your Azure subscription. Install it from here.
kubectl: The Kubernetes command-line tool,
kubectl
, is needed to interact with your AKS cluster. Install it from here.Terraform: Install Terraform from here.
Additionally, you need an Azure subscription where you can create the required resources.
Flow Chart:
As we are provisioning AKS using the Terraform modular approach, if you are not familiar with Terraform modules, please check out blog on modular Terraform.
Features of Project:
To make the Terraform configuration more robust and maintainable, considered the following enhancements:
Modularized Terraform Configuration: Split the configuration into modules for better organization.
Added Detailed Comments: Included comments in your Terraform files to explain each resource and its purpose.
Implemented Output Variables: Used output variables to capture and display critical information like the kubeconfig location and Key Vault secrets.
Use Cases:
This automated AKS setup can be used in various scenarios:
1. DevOps Automation:
Automate the setup and management of Kubernetes clusters as part of your CI/CD pipeline. This ensures that your development, testing, and production environments are consistent and reproducible.
2. Multi-Environment Deployment:
Easily deploy Kubernetes clusters across multiple environments (e.g., development, staging, production) with consistent configurations. Each environment can have its own set of variables and configurations, ensuring isolated and secure deployments.
3. Disaster Recovery:
By using Terraform, you can quickly recreate your entire AKS infrastructure in case of a disaster. This ensures minimal downtime and quick recovery, as the entire setup is defined in code and can be reapplied.
4. Compliance and Security:
Ensure that your AKS clusters are compliant with organizational security policies by defining and managing all configurations through code. This includes secure storage of credentials, role assignments, and monitoring setups.
5. Scalable Infrastructure:
Automate the scaling of your AKS clusters based on workload demands. This allows you to dynamically adjust the size and capacity of your clusters, optimizing resource usage and cost.
Key Steps in Your Automation Journey
The provided Terraform configuration accomplishes the following tasks:
Creates an Azure Resource Group.
Provisions a Service Principal for managing Azure resources.
Creation of App Registration to generate Service Principal.
Assigns the Contributor role to the Service Principal.
Creates an Azure Key Vault and stores the Service Principal credentials.
Deploys an Azure Kubernetes Service (AKS) cluster.
Create all the resources to do monitoring of AKS cluster.
Outputs the kubeconfig file for accessing the AKS cluster.
Detailed Breakdown of the Configuration
Let's dive into the Terraform code to understand what each part does.
Project Structure:
.
├── BackendResources.sh // Create resources to store .tfstate file as backend
├── modules
│ ├── aks
│ │ ├── main.tf
│ │ ├── output.tf
│ │ └── variables.tf
│ ├── keyvault
│ │ ├── main.tf
│ │ ├── output.tf
│ │ └── variables.tf
│ ├── monitoring
│ │ ├── main.tf
│ │ ├── output.tf
│ │ └── variables.tf
│ └── ServicePrincipal
│ ├── main.tf
│ ├── output.tf
│ └── variables.tf
├── versions.tf //contains versions of provider
├── main.tf
├── variables.tf
├── terraform.tfvars
├── output.tf
├── backend.tf // Connecting to backend and storing tfstate file in backend
└── README.md
1. Provider Configuration
provider "azurerm" {
features {
}
}
The provider
block configures the Azure Resource Manager (azurerm) provider. This provider allows Terraform to interact with Azure resources.
2. Creating a Resource Group
resource "azurerm_resource_group" "rg" {
name = var.rgname
location = var.location
}
The azurerm_resource_group
resource creates an Azure Resource Group, which acts as a container for all Azure resources.
3. Service Principal Module
module "ServicePrincipal" {
source = "./modules/ServicePrincipal"
service_principal_name = var.service_principal_name
depends_on = [
azurerm_resource_group.rg
]
}
The ServicePrincipal
module provisions a Service Principal, which is an Azure Active Directory application used to manage resources programmatically. This module ensures secure and controlled access to your Azure resources.
4. Role Assignment
resource "azurerm_role_assignment" "rolespn" {
scope = "/subscriptions/${var.subscription_id}"
role_definition_name = "Contributor"
principal_id = module.ServicePrincipal.service_principal_object_id
description = "Role Based Access Control, Contributor role assignment to ServicePrincipal"
depends_on = [
module.ServicePrincipal
]
}
The azurerm_role_assignment
resource assigns the Contributor role to the Service Principal, granting it permissions to manage Azure resources within the specified subscription.
5. Key Vault Module
module "keyvault" {
source = "./modules/keyvault"
keyvault_name = var.keyvault_name
location = var.location
resource_group_name = var.rgname
service_principal_name = var.service_principal_name
service_principal_object_id = module.ServicePrincipal.service_principal_object_id
service_principal_tenant_id = module.ServicePrincipal.service_principal_tenant_id
client_id = module.ServicePrincipal.client_id
client_secret = module.ServicePrincipal.client_secret
depends_on = [
module.ServicePrincipal
]
}
The keyvault
module creates an Azure Key Vault, a secure place to store secrets, keys, and certificates. This module also stores the Service Principal credentials in the Key Vault for secure access.
6. Storing Service Principal Credentials in Key Vault
resource "azurerm_key_vault_secret" "spn_secret" {
name = module.ServicePrincipal.client_id
value = module.ServicePrincipal.client_secret
key_vault_id = module.keyvault.keyvault_id
depends_on = [
module.keyvault,
]
}
The azurerm_key_vault_secret
resource stores the Service Principal's client ID and secret in the Key Vault, ensuring these sensitive credentials are kept secure.
7. Creating an AKS Cluster
module "aks" {
source = "./modules/aks/"
service_principal_name = var.service_principal_name
client_id = module.ServicePrincipal.client_id
client_secret = module.ServicePrincipal.client_secret
location = var.location
resource_group_name = var.rgname
depends_on = [
module.ServicePrincipal
]
}
The aks
module provisions an Azure Kubernetes Service (AKS) cluster. This module utilizes the Service Principal for authentication and creates the Kubernetes cluster in the specified location and resource group.
8. Outputting the kubeconfig File
resource "local_file" "kubeconfig" {
depends_on = [module.aks]
filename = "./kubeconfig"
content = module.aks.config
}
The local_file
resource creates a kubeconfig file, which is necessary for interacting with the AKS cluster using kubectl. This file is stored locally and contains the configuration details required to connect to the cluster.
The kubeconfig
file is used to authenticate and authorize access to a Kubernetes cluster from Jenkins jobs or agents. Here’s how it is typically used and its purpose:
Purpose of kubeconfig File:
Authentication: Kubernetes uses client certificates, tokens, or other authentication methods to authenticate users and services. The
kubeconfig
file contains authentication information such as API server endpoint, client certificate, client key, and token if applicable.Authorization: Once authenticated, Kubernetes checks the permissions of the authenticated entity against its RBAC (Role-Based Access Control) rules to authorize access to specific resources.
Cluster Configuration: It also includes configuration details like the cluster name, context (combination of cluster, namespace, and user), and other settings required to interact with the Kubernetes cluster.
9. Monitoring Module
module "monitoring" {
source = "./modules/monitoring"
log_analytics_workspace_name = var.log_analytics_workspace_name
location = var.location
resource_group_name = var.rgname
aks_cluster_id = module.aks.cluster_id
depends_on = [
module.aks
]
}
The monitoring module sets up an Azure Log Analytics Workspace and configures diagnostic settings for the AKS cluster. This enables the collection of logs and metrics, providing valuable insights into the cluster's performance and health.
10. Log Analytics Workspace
resource "azurerm_log_analytics_workspace" "log_analytics" {
name = var.log_analytics_workspace_name
location = var.location
resource_group_name = var.rgname
sku = "PerGB2018"
retention_in_days = 30
}
The azurerm_log_analytics_workspace resource creates a Log Analytics Workspace, which acts as a central repository for logs and metrics collected from the AKS cluster.
11. Diagnostic Settings for AKS
resource "azurerm_monitor_diagnostic_setting" "aks_monitoring" {
name = "aks-monitoring"
target_resource_id = module.aks.cluster_id
log_analytics_workspace_id = azurerm_log_analytics_workspace.log_analytics.id
log {
category = "kube-apiserver"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
log {
category = "kube-audit"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
log {
category = "kube-controller-manager"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
log {
category = "kube-scheduler"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
log {
category = "cluster-autoscaler"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
metric {
category = "AllMetrics"
enabled = true
retention_policy {
enabled = true
days = 30
}
}
}
The azurerm_monitor_diagnostic_setting resource configures the diagnostic settings for the AKS cluster, specifying which logs and metrics to capture and send to the Log Analytics Workspace.
Monitoring an AKS cluster is crucial for maintaining its health and performance. Azure Monitor provides comprehensive monitoring and diagnostics for AKS clusters.
Step-by-Step Guide to Recreate the Setup
Clone the repository:
git clone https://github.com/vsingh55/Automated-AKS-Cluster-Provisioning-Using-Terraform-and-Service-Principal.git
Create Storage Account and Blob Container for Terraform State: To manage Terraform state files, we use Azure Storage Account.
First, log in to your Azure account using the Azure CLI:
az login
Run the following script to create a storage account and a blob container:
#!/bin/bash
#Declare variable names
RESOURCE_GROUP_NAME=backend-rg STORAGE_ACCOUNT_BASE_NAME=backendsa4tf RANDOM_STRING=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 5 | head -n 1) STORAGE_ACCOUNT_NAME="${STORAGE_ACCOUNT_BASE_NAME}${RANDOM_STRING}" CONTAINER_NAME=tfstate
#Create resource group
az group create --name $RESOURCE_GROUP_NAME --location centralindia
#Create storage account
az storage account create --resource-group $RESOURCE_GROUP_NAME --name $STORAGE_ACCOUNT_NAME --sku Standard_LRS --encryption-services blob
#Create blob container
az storage container create --name $CONTAINER_NAME --account-name $STORAGE_ACCOUNT_NAME
echo "Storage Account Name: $STORAGE_ACCOUNT_NAME"
This will give you unique storage account name copy and paste it to terraform.tfvars file.
Edit .tfvars file fill all the required values and you can also change the names of resources.
Now run the following cmd's:
terraform init
terraform plan
terraform apply
After provisioning and utilizing AKS never forget to destroy resources.
tf destroy --auto-approve
Troubleshooting Common Terraform Issues
While running the Terraform commands, you might encounter some common errors. Here are a few and their resolutions:
Error 1: Service Principal Not Found
Error: creating Cluster: (Managed Cluster Name
"clusternew-aks-cluster" / Resource Group "rgname"):
containerservice.ManagedClustersClient#CreateOrUpdate:
Failure sending request: StatusCode=404 -- Original Error:
Code="ServicePrincipalNotFound" Message="Service principal
clientID: xxxx-xxxxx-xxxx-xxxxx not found in Active Directory
tenant xxxx-xxxxx-xxxx-xxxxx, Please see https://aka.ms/
aks-sp-help for more details."
Resolution: Rerun the tf apply
command. This could be a bug in the particular provider version.
Error 2: Key Vault Permission Issue
Error: checking for presence of existing Secret
"xxxx-xxxxx-xxxx-xxxxx" (Key Vault "https://keyvaultname.
vault.azure.net/"): keyvault.BaseClient#GetSecret: Failure
responding to request: StatusCode=403 -- Original Error:
autorest/azure: Service returned an error. Status=403
Code="Forbidden" Message="Caller is not authorized to perform
action on resource.\r\nIf role assignments, deny assignments
or role definitions were changed recently, InnerError=
{"code":"ForbiddenByRbac"}
on main.tf line 46, in resource "azurerm_key_vault_secret"
"example":
46: resource "azurerm_key_vault_secret" "example" {
Resolution: Ensure the user has the Key Vault Admin role even if the user has the owner role.
Conclusion
By using Terraform to automate the provisioning of Azure resources and an AKS cluster, you can streamline your infrastructure management processes. This configuration ensures that all resources are created consistently and securely, with appropriate role assignments and secure storage of sensitive information.
Including Azure Monitor for your AKS cluster enhances visibility into the health and performance of your Kubernetes deployments, providing valuable insights for maintaining a robust and reliable infrastructure.
I hope this project has provided a clear and comprehensive guide using Terraform for automating Azure infrastructure provisioning and AKS deployment.
Subscribe to my newsletter
Read articles from Vijay Kumar Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vijay Kumar Singh
Vijay Kumar Singh
I'm Vijay Kumar Singh, a Linux, DevOps, Cloud enthusiast learner and contributor in shell scripting, Python, networking, Kubernetes, Terraform, Ansible, Jenkins, and cloud (Azure, GCP, AWS) and basics of IT world. 💻✨ Constantly exploring innovative IT technologies, sharing insights, and learning from the incredible Hashnode community. 🌟 On a mission to build robust solutions and make a positive impact in the tech world. 🚀 Let's connect and grow together! #PowerToCloud