GitHub Actions: A Transformative CI/CD Solution for Platform Engineers

JobyJoby
7 min read

The Breaking Point

I still remember the day our deployment pipeline failed spectacularly. It was 3 AM, and my phone wouldn't stop buzzing with alerts. Our custom Jenkins setup, which had served reliably for years, had finally buckled under the weight of our growing infrastructure. Three different teams needed critical features deployed before the business day started, and our pipeline was completely jammed.

As I dragged myself out of bed and fired up my laptop, I couldn't help but think there had to be a better way. Our Jenkins server was a maintenance nightmare—custom plugins that no one fully understood, configuration drift between environments, and a constantly growing backlog of security patches we needed to apply. Not to mention the headache of managing the infrastructure it ran on.

That night, after patching things together just in time for the morning standup, our team decided to explore GitHub Actions as an alternative.

Exploring GitHub Actions

GitHub Actions had been evolving steadily since its release. We were already using GitHub for source control, so the integrated workflow seemed appealing, but there was hesitation about migrating complex CI/CD pipelines to a relatively new service.

The first pilot project was simple: a small Terraform module that managed some non-critical GCP resources. Creating the workflow was refreshingly straightforward—just a YAML file in the .github/workflows directory:

name: Terraform Apply

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Terraform Init
        run: terraform init

      - name: Terraform Format
        run: terraform fmt -check

      - name: Terraform Plan
        run: terraform plan

      - name: Terraform Apply
        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
        run: terraform apply -auto-approve

Within a day, the team had a working pipeline that was simpler, more reliable, and required zero infrastructure maintenance.

The Clear Benefits

As more workloads migrated to GitHub Actions, the advantages became increasingly obvious:

Zero infrastructure maintenance: No more patching servers, managing Docker hosts, or worrying about scaling the CI/CD infrastructure during busy periods.

Seamless GitHub integration: Pull request checks, branch protections, and deployment environments all tied directly into existing GitHub workflows.

Marketplace actions: Rather than writing complex scripts, teams could leverage pre-built actions for common tasks. Need to push a container to GCP Artifact Registry? There's an action for that. Want to run security scans on your Terraform code? There's an action for that too.

Matrix builds: Testing across multiple platforms and language versions became trivial:

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest]
        node-version: [14.x, 16.x, 18.x]
    steps:
      - uses: actions/checkout@v3
      - name: Use Node.js ${{ matrix.node-version }}
        uses: actions/setup-node@v3
        with:
          node-version: ${{ matrix.node-version }}
      - run: npm ci
      - run: npm test

Self-hosted runners on cloud VMs: For our specific workloads, we set up self-hosted runners on GCP and Azure VMs. This approach gave us complete control over the compute resources while still benefiting from GitHub Actions' orchestration capabilities. It also allowed us to:

  • Use custom VM sizes optimized for our specific workloads

  • Maintain runners in the same VPC as our other cloud resources for secure access

  • Implement auto-scaling for our runner pools based on workflow queue depth

  • Benefit from cloud provider discounts for committed-use VMs

Unified secret management: No more juggling credentials between different systems—GitHub's encrypted secrets worked across all workflows.

Real-World Workflows That Made a Difference

One of the most impactful implementations was a full end-to-end IaC pipeline that handled Terraform validation, security scanning, cost estimation, and deployment—all triggered automatically from pull requests:

name: Infrastructure Pipeline

on:
  pull_request:
    paths:
      - 'terraform/**'
      - '.github/workflows/terraform.yml'

jobs:
  validate:
    runs-on: self-hosted-cloud-vm
    steps:
      - uses: actions/checkout@v3

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Terraform Init
        run: terraform init -backend=false

      - name: Terraform Validate
        run: terraform validate

  security:
    needs: validate
    runs-on: self-hosted-cloud-vm
    steps:
      - uses: actions/checkout@v3

      - name: Run tfsec
        uses: aquasecurity/tfsec-action@v1.0.0

      - name: Run checkov
        uses: bridgecrewio/checkov-action@master

  cost:
    needs: security
    runs-on: self-hosted-cloud-vm
    steps:
      - uses: actions/checkout@v3

      - name: Setup Infracost
        uses: infracost/actions/setup@v2
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}

      - name: Generate Infracost cost estimate
        run: infracost breakdown --path . --format json > infracost.json

      - name: Post Infracost comment
        uses: infracost/actions/comment@v2
        with:
          path: infracost.json

This workflow gave developers immediate feedback on their infrastructure changes, highlighting security issues and potential cost increases before code was even merged—a massive improvement over previous approaches.

For Azure workloads, a workflow was implemented that handled cross-cloud authentication, standardising the deployment process regardless of cloud provider:

name: Deploy to Azure

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: self-hosted-cloud-vm
    steps:
      - uses: actions/checkout@v3

      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2

      - name: Terraform Init
        run: terraform init

      - name: Terraform Apply
        run: terraform apply -auto-approve

Cloud VM Runners: Implementation Details

Our cloud VM runner setup evolved to become quite sophisticated:

VM provisioning automation: We used Terraform to manage the lifecycle of our runner VMs, with separate configurations for both GCP and Azure. This included:

  • Compute instance configurations

  • Networking and security settings

  • Identity and access management

  • Auto-scaling instance groups

Runner installation: A startup script automatically installed and configured the GitHub Actions runner software, registering the VM with the correct repositories or organizations.

VM image management: We maintained custom VM images with pre-installed dependencies to reduce startup time and ensure consistency.

Cost optimization strategies:

  • Spot/preemptible instances for non-critical workflows

  • Instance right-sizing based on workflow requirements

  • Auto-scaling based on GitHub workflow queue metrics

  • Scheduled scale-down during off-hours

Security considerations:

  • Ephemeral runners that reset after each job

  • Private networking with controlled egress

  • Just-in-time access for runners to cloud resources

  • Regular rotation of runner registration tokens

Challenges and Lessons Learned

Despite the many benefits, the transition wasn't without challenges:

Billing surprises: While GitHub Actions offers generous free minutes, larger projects quickly exceeded them. Setting up billing alerts and monitoring usage became essential. With our cloud VM approach, we needed to carefully monitor both GitHub billing and cloud provider costs.

Workflow organisation: As teams scaled to dozens of repositories, maintaining consistent workflows became challenging. Creating a central repository of reusable workflows and standardising approaches helped address this issue.

Complex dependencies: Some builds required unusual dependencies or large binary files. Leveraging GitHub's cache action helped avoid repeatedly downloading these resources. For our cloud VM runners, we created custom VM images with commonly used dependencies pre-installed.

Rate limiting: During peak development periods, teams occasionally hit API rate limits. Implementing thoughtful workflow triggers and consolidating jobs helped mitigate this.

VM provisioning delays: When auto-scaling kicked in, new runners sometimes took several minutes to come online. We implemented a warm pool of standby runners to mitigate this issue.

Who Should Try GitHub Actions

Based on this experience with GitHub Actions in production, here's who should consider making the switch:

Definitely try it if:

  • You're already using GitHub for source control

  • Your team is spending significant time maintaining CI/CD infrastructure

  • You want to democratise CI/CD and empower developers to own their pipelines

  • You need to test across multiple platforms or language versions

  • You're looking to standardise deployments across multiple cloud providers

  • You want the flexibility to run workflows on cloud VMs under your control

Proceed with caution if:

  • Your builds require very specialised hardware or extremely large resource allocations

  • You have existing investments in enterprise CI/CD tools with deep integration into other systems

  • Regulatory requirements demand complete control over the build infrastructure

  • Your workflows need to run in completely air-gapped environments

Conclusion: The Community Benefits

The most surprising outcome of GitHub Actions adoption wasn't just the technical improvements—it was the cultural shift. Developers who previously viewed CI/CD as "operations territory" began owning their pipelines, experimenting with optimisations, and sharing workflow improvements across teams.

Organizations typically see deployment frequency increase significantly, while the average time to recover from failed deployments drops dramatically. Code reviews become more meaningful as automated checks catch common issues before human reviewers see the code.

For platform engineers, GitHub Actions represents more than just another CI/CD tool—it's a step toward the self-service infrastructure that modern organizations need. By eliminating the maintenance overhead of traditional CI/CD systems, platform teams can focus on higher-value platform work that truly moves businesses forward.

Running GitHub Actions on cloud VMs offers the best of both worlds: the simplicity and integration of GitHub's platform with the control and customization of your own infrastructure. This hybrid approach has proven particularly valuable for teams with specialized compute requirements or strict security and compliance needs.

If you're drowning in CI/CD infrastructure maintenance or tired of fighting with opaque build systems, GitHub Actions is worth exploring. Start small, perhaps with a simple workflow alongside your existing pipeline, and see if it can deliver similar transformations to what many engineering teams have experienced.

That 3 AM incident response might become a thing of the past—and your on-call engineers will thank you for it.

0
Subscribe to my newsletter

Read articles from Joby directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Joby
Joby