Automating Azure Image builds with HashiCorp Packer and GitHub Actions

Lukas RottachLukas Rottach
13 min read

Welcome to part two of my series of blogs Managing Azure Images. In this part of the series we will implement some exciting changes to our existing project like automated, recurring image builds.

Recap

In the first part of this series we explored the capabilities of building Azure Images with Packer, had a look at the structure of a template and built a Packer template to build our first image. After finishing the first part you should understand the basics of Packer, customize your build with provisioners and build your images.

In the previous post of this series, we learned how to start building custom Azure Images using HashiCorp Packer. We created a basic Packer template with HCL to set up a consistent and automated process for managing Windows images. We also explored using provisioners to install software, manage reboots, and properly deprovision images with Sysprep. By following along, you successfully built your first managed Azure Image manually, seeing firsthand how Packer simplifies image creation, reduces manual work, and improves consistency across environments. Now, it's time to take this a step further and automate the entire build process.

Foreword

Before we start, a few words. So, why do we even want to develop this project further at all? Well, the initial setup from our previous blog comes with a couple of downsides. First and foremost, we have to run the Packer build manually each time we want a fresh Azure image. Let's be honest, no one enjoys repetitive manual tasks. For the rest of us, automating the process and scheduling regular image builds not only saves time but also ensures that our images are consistently up-to-date, secure, and ready whenever we need them.

Another problem with our current setup is that you can't run the packer build twice without making manual adjustments. Since we're using Azure Managed Images, trying to create an image with the same name will cause the build to fail. The second run will unavoidable trip over the image created by the first run. Clearly, we need a more dynamic approach to image naming and management to avoid such collisions and ensure seamless, repeatable builds.

➡️ As always, you are welcome to check out the code on my GitHub repository instead of going through the development step by step: https://github.com/lrottach/blog-az-image-packer

Introduction

In this blog post, we're taking our Azure image management setup to the next level by introducing automation with GitHub Actions. We'll leverage GitHub Actions workflows to schedule recurring Packer image builds, ensuring we always have fresh and up-to-date images without needing to manually kick off builds ourselves. Of course, you can use a provider you prefer to build your pipeline. Additionally, we'll implement an Azure Compute Gallery, allowing us to efficiently store, replicate, and version our images with ease. By making our Packer builds dynamic, each scheduled run will automatically produce a unique version, helping us maintain clarity, consistency, and flexibility across our deployments. Let's dive in!

Authentication

Before we dive deeper into building our GitHub Actions pipeline, let's talk briefly about authentication. To automate deployments and execute Packer builds from GitHub Actions, we need secure and reliable authentication to Azure. For my scenario, I used Federated Identity, which allows GitHub Actions workflows to authenticate directly with Azure without the need to manage or store secrets explicitly. With federated identity, GitHub is granted trusted access to Azure through OpenID Connect (OIDC), eliminating the overhead and security concerns associated with traditional service principal credentials stored as GitHub secrets.

In this post, I won't dive into every detail about setting up federated identity. Otherwise, we might end up writing another full series just on authentication! Instead, I've prepared Terraform code that automatically deploys and configures all the required resources for federated identity, including the GitHub OIDC provider, role assignments, and the necessary Azure Active Directory application configurations.

For those interested in the detailed setup and to understand exactly what's deployed, feel free to check out the provided Terraform code in the GitHub repository under the ./src/auth/ folder.

📢
I haven't figured out how to authenticate the Packer azure-arm provider directly with federated authentication. Instead, I use Azure CLI to authenticate and then apply that session in my Packer configuration. If you have any tips or insights on this, please let me know.

Storing Images

In this series, we're introducing an Azure Compute Gallery to efficiently manage, store, and distribute our Azure images. Azure Compute Gallery, formerly known as Shared Image Gallery, provides a dedicated Azure service specifically designed to handle virtual machine images at scale.

The primary benefit of using Azure Compute Gallery is that it simplifies image management by allowing you to store multiple versions of an image, replicate images across multiple Azure regions for better redundancy and quicker deployments, and provide organized access control. This ensures improved reliability, availability, and version control, which becomes especially valuable in larger or geographically distributed environments.

Deploying the prerequisites

Before we begin, I assume you are familiar with the authentication part of the project. As I mentioned earlier, we need additional resources for the upcoming setup. We will no longer use Azure Managed Images to store our custom images. Instead, we will first deploy an Azure Compute Gallery. This Terraform deployment will be executed by the pipeline later.

# Resource Group
resource "azurerm_resource_group" "main" {
  name     = var.resource_group_name
  location = var.location
}

# Azure Compute Gallery
resource "azurerm_shared_image_gallery" "main" {
  name                = var.gallery_name
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  description         = "Azure Compute Gallery for sharing VM images."
}

# Image Definition
resource "azurerm_shared_image" "main" {
  name                = var.image_definition_name
  gallery_name        = azurerm_shared_image_gallery.main.name
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  os_type             = var.os_type

  hyper_v_generation = "V2"

  identifier {
    publisher = var.image_publisher
    offer     = var.image_offer
    sku       = var.image_sku
  }
}

Here we define the Azure Compute Gallery itself and one single image definition. An image definition in Azure Compute Gallery acts as a logical grouping for different versions of a specific virtual machine image. It specifies key properties like operating system type, generation, and VM configuration, providing a structured way to organize and manage your image lifecycle.

If you browse through the Terraform deployment, you will see, that I am using OIDC authentication for the provider and backend.

# provider.tf
provider "azurerm" {
  features {}
  use_oidc = true
}

# versions.tf
terraform {
  required_version = ">=1.11.0"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }

  backend "azurerm" {
    key      = "terraform.tfstate"
    use_oidc = true
  }
}

In addition, we need to define some outputs. We will later get values like the name of the compute gallery and image definition from the output and use them in our Packer template.

# outputs.tf
output "resource_group_id" {
  description = "The ID of the Resource Group."
  value       = azurerm_resource_group.main.id
}

output "resource_group_name" {
  description = "The name of the Resource Group."
  value       = azurerm_resource_group.main.name
}

output "gallery_id" {
  description = "The ID of the Azure Compute Gallery."
  value       = azurerm_shared_image_gallery.main.id
}

output "gallery_name" {
  description = "The name of the Azure Compute Gallery."
  value       = azurerm_shared_image_gallery.main.name
}

output "image_definition_id" {
  description = "The ID of the image definition."
  value       = azurerm_shared_image.main.id
}

output "image_definition_name" {
  description = "The name of the image definition."
  value       = azurerm_shared_image.main.name

}

Extending the Packer template

Now that we’ve set up our Azure Compute Gallery, it’s time to revisit and extend our original Packer template. The existing template, as it currently stands, won’t be compatible with our new approach, since it’s configured to produce a static Azure Managed Image. To integrate smoothly with the Compute Gallery, we’ll need to enhance our template by parsing additional parameters and adjusting the output configuration accordingly. This way, each Packer build will automatically publish its resulting image version directly into our Azure Compute Gallery, making the entire process dynamic and scalable.

➡️ Check out the full Packer template here: https://github.com/lrottach/blog-az-image-packer/blob/main/src/az-windows-11-ent.pkr.hcl

First things first. In the future we will need some more information in our template. Mostly to implement the new publishing to an Azure Compute Gallery.

// Variables
// **********************

variable "subscription_id" {
  type    = string
  default = "${env("ARM_SUBSCRIPTION_ID")}"
}

variable "image_gallery_name" {
  type    = string
}

variable "image_gallery_resource_group" {
  type    = string
}

variable "image_definition_name" {
  type    = string
}

variable "image_version" {
  type    = string
  default = "1.0.0"
}

Next, let’s implement the new output type. Make sure to remove the old managed image approach.

  // Shared Image Gallery configuration
  shared_image_gallery_destination {
    subscription         = "${var.subscription_id}"
    gallery_name         = "${var.image_gallery_name}"
    resource_group       = "${var.image_gallery_resource_group}"
    image_name           = "${var.image_definition_name}"
    image_version        = "${var.image_version}"
    storage_account_type = "Standard_LRS"
    target_region {
      name = "eastus"
    }
  }
  #   // OLD: Managed Image information
  #   managed_image_resource_group_name = "rg-p1-corp-packer-eus"
  #   managed_image_name                = "windows11-ent-packer-image-v1-eus"

In my case, I chose a federated authentication approach. This means I won't use any applications and secrets for authentication in this template. The Azure CLI will be authenticated during the GitHub workflow run, and Packer will use that session to authenticate the provider.

  // Authentication
  use_azure_cli_auth = true

Now we’ve created the Terraform deployment to create our Azure Compute Gallery and updated the Packer template to no longer output the image as a managed image, instead use the compute gallery.

Of course you can make this much vor dynamic by providing more regions for image replication, image copies and different storage account types.


Pipeline Setup

Now the real fun begins… It's time to bring everything together and automate our Azure image builds using GitHub Actions. The pipeline we're setting up here will handle the entire workflow, from infrastructure deployment using Terraform, to the image creation with Packer, and finally publishing the image into the Azure Compute Gallery.

I have to say, the workflow I’ve built here might seem quite extensive at first glance, but it's designed this way to fit my specific needs and use cases. My main goal was clear: to automate the entire image-generation process, ensuring I always have fresh, ready-to-use Azure images for my training sessions, workshops, and demos.

The pipeline is divided into two key jobs:

  • Terraform deployment: This handles setting up and updating the necessary Azure infrastructure, including our Compute Gallery and image definitions. It uses federated identity for secure authentication, eliminating the need to manually manage sensitive credentials. The Terraform deployment outputs relevant information about the Azure Compute Gallery which will then be used in the next steps by Packer.

  • Packer build: After the infrastructure is set up, this job calculates the next image version (or uses an override if specified) and runs the Packer build to create a new version of the Azure Compute Gallery image.

➡️ Check out the full GitHub Actions Workflow here: https://github.com/lrottach/blog-az-image-packer/blob/main/.github/workflows/schedule-packer.yml

On the first few lines I’ve configured things like the trigger of that pipeline, configuration of federated authentication and some variables. I will cover the workflow_dispatch part later.

name: Schedule Packer automatic image build
on:
  schedule:
    # Run once per month on the 1st at 00:00 UTC
    - cron: '0 0 1 * *'
  workflow_dispatch:
    inputs:
      image_version_override:
        description: 'Optional: Override the automatic versioning (format: x.y.z)'
        required: false
        type: string

permissions:
  contents: read
  id-token: write
  pull-requests: write

env:
  ARM_CLIENT_ID: ${{ vars.AZURE_CLIENT_ID }}
  ARM_SUBSCRIPTION_ID: ${{ vars.AZURE_SUBSCRIPTION_ID }}
  ARM_TENANT_ID: ${{ vars.AZURE_TENANT_ID }}
  ARM_USE_AZUREAD: true
  ARM_USE_OIDC: true
  PACKER_VERSION: "latest"
  TERRAFORM_VERSION: "latest"
  WORKING_DIR: ./src

Terraform Deploy

I won't go into too much detail here. The first job is straightforward. All the necessary resources will be deployed. The deployment is validated, formatted, and executed. As I mentioned, the required Terraform outputs are extracted and passed to the next job.

Packer Build

Now, for my favorite part. First, we need to set up this job. As mentioned earlier, I will authenticate the Azure CLI using federated authentication. The Packer provider will then use that authenticated session to run.

packer-build:
    name: Build VM Image
    environment: dev
    runs-on: ubuntu-latest
    needs: terraform-deploy
    defaults:
      run:
        working-directory: ${{ env.WORKING_DIR }}
    steps:

      - name: Checkout code
        uses: actions/checkout@v4

      - name: Azure Login with OIDC
        uses: azure/login@v2
        with:
          client-id: ${{ env.ARM_CLIENT_ID }}
          tenant-id: ${{ env.ARM_TENANT_ID }}
          subscription-id: ${{ env.ARM_SUBSCRIPTION_ID }}

I've added some logic here. As you might know, running Packer with the same inputs repeatedly will fail because Packer can't create an image version that already exists. Since I want this pipeline to run without any manual intervention, we need logic that creates a new version to use as input for each run.

Some words about what is happening within the Get Image Version step. The Get Image Version step in that pipeline dynamically retrieves and calculates the next version number for the image we're building. It queries the Azure Compute Gallery to find the latest existing version of our image definition. If it discovers an existing image version, it increments the patch number automatically (e.g., from 1.0.0 to 1.0.1). If no prior version exists yet, the pipeline starts with the default version of 1.0.0. The calculated version number is then stored and passed on to subsequent pipeline steps, ensuring each Packer build generates a unique and incrementally versioned Azure image.

- name: Get Image Version
        id: get_version
        run: |
          # Check if override version is provided
          OVERRIDE_VERSION="${{ github.event.inputs.image_version_override }}"

          if [[ -n "$OVERRIDE_VERSION" && "$OVERRIDE_VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
            echo "Using override version: $OVERRIDE_VERSION"
            NEW_VERSION="$OVERRIDE_VERSION"
          else
            # Get version from Azure Compute Gallery
            IMAGE_DEF_ID="${{ needs.terraform-deploy.outputs.image_definition_id }}"
            RESOURCE_GROUP="${{ needs.terraform-deploy.outputs.resource_group_name }}"
            GALLERY_NAME="${{ needs.terraform-deploy.outputs.gallery_name }}"
            IMAGE_DEF_NAME="${{ needs.terraform-deploy.outputs.image_definition_name }}"

            echo "Resource Group: $RESOURCE_GROUP"
            echo "Gallery Name: $GALLERY_NAME"
            echo "Image Definition: $IMAGE_DEF_NAME"

            # Query latest version
            LATEST_VERSION=$(az sig image-version list \
              --resource-group $RESOURCE_GROUP \
              --gallery-name $GALLERY_NAME \
              --gallery-image-definition $IMAGE_DEF_NAME \
              --query "max_by([], &name).name" -o tsv)

            # If no version exists, set default
            if [ -z "$LATEST_VERSION" ] || [ "$LATEST_VERSION" == "null" ]; then
              LATEST_VERSION="1.0.0"
              echo "No existing versions found, starting with $LATEST_VERSION"
            else
              echo "Latest version found: $LATEST_VERSION"
            fi

            # Increment version
            IFS='.' read -ra VERSION_PARTS <<< "$LATEST_VERSION"
            MAJOR=${VERSION_PARTS[0]}
            MINOR=${VERSION_PARTS[1]}
            PATCH=$((${VERSION_PARTS[2]} + 1))
            NEW_VERSION="$MAJOR.$MINOR.$PATCH"

            echo "New version: $NEW_VERSION"
          fi

          echo "NEW_IMAGE_VERSION=$NEW_VERSION" >> $GITHUB_ENV
          echo "new_version=$NEW_VERSION" >> $GITHUB_OUTPUT

Every time the pipeline runs, it shows the previous image version and the newly calculated version, allowing you to verify them.

With the previous Terraform deployment and the newly calculated image version, we now have everything needed to run the Packer deployment.

- name: Setup Packer
        uses: hashicorp/setup-packer@main
        with:
          version: ${{ env.PACKER_VERSION }}

      - name: Packer Initialize
        run: packer init ./az-windows-11-ent.pkr.hcl

      - name: Packer Validate
        run: packer validate ./az-windows-11-ent.pkr.hcl

      - name: Packer Build
        run: |
          echo "Building new image version: $NEW_IMAGE_VERSION"
          packer build -var "image_version=$NEW_IMAGE_VERSION" ./az-windows-11-ent.pkr.hcl

Manual Image changes

❓But what if we need to update the image manually, or what if we want to introduce a new major version?

Good questions. In most cases, simply running the image build pipeline repeatedly isn't enough. Sometimes, you might want to publish a new image yourself. Other times, you might make significant changes to the image template and need to increase the major image version.

To handle these situations, I've added an override feature. I can run the pipeline anytime and set my own image version. The great part is that during the next scheduled run, the system will recognize my overwritten image version and continue to increase it as usual.

Now, if we check the Get Image Version step, we see that a version was provided using an override. So, the usual version increase calculation magic will not be used this time.

Same here on the Packer Build step.

After the pipeline finished, let’s have a final look into our Azure Compute Gallery to check if everything went fine. Here we go!


Conclusion

In this post, we've automated our Azure image-building process by integrating GitHub Actions with HashiCorp Packer, Terraform, and Azure Compute Gallery. By moving away from manual builds, we've made image creation and management easier, saving time and ensuring consistency across deployments. Now, scheduled pipeline runs take care of everything automatically, providing us with version-controlled, always-up-to-date images for our Azure environments.

You can definitely expand on this example. What I've built here isn't an enterprise-ready setup. If you want to manage multiple different images, it can become a bit more complex.

Inspiration 🛋️

That’s it for today. Before you leave, here are my recommendations for two amazing albums. Enjoy it!

  • 🎧 The Sky, the Earth & All Between from Architects

  • 🎧 Arcane League of Legends: Season 2 (Soundtrack)

0
Subscribe to my newsletter

Read articles from Lukas Rottach directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Lukas Rottach
Lukas Rottach

I am an Azure Architect based in Switzerland, specializing in Azure Cloud technologies such as Azure Functions, Microsoft Graph, Azure Bicep and Terraform. My expertise lies in Infrastructure as Code, where I excel in automating and optimizing cloud infrastructures. With a strong passion for automation and development, I aim to share insights and practices to inspire and educate fellow tech enthusiasts. Join me on my journey through the dynamic world of the Azure cloud, where innovation meets efficiency.