GitHub Actions: A Transformative CI/CD Solution for Platform Engineers


The Breaking Point
I still remember the day our deployment pipeline failed spectacularly. It was 3 AM, and my phone wouldn't stop buzzing with alerts. Our custom Jenkins setup, which had served reliably for years, had finally buckled under the weight of our growing infrastructure. Three different teams needed critical features deployed before the business day started, and our pipeline was completely jammed.
As I dragged myself out of bed and fired up my laptop, I couldn't help but think there had to be a better way. Our Jenkins server was a maintenance nightmare—custom plugins that no one fully understood, configuration drift between environments, and a constantly growing backlog of security patches we needed to apply. Not to mention the headache of managing the infrastructure it ran on.
That night, after patching things together just in time for the morning standup, our team decided to explore GitHub Actions as an alternative.
Exploring GitHub Actions
GitHub Actions had been evolving steadily since its release. We were already using GitHub for source control, so the integrated workflow seemed appealing, but there was hesitation about migrating complex CI/CD pipelines to a relatively new service.
The first pilot project was simple: a small Terraform module that managed some non-critical GCP resources. Creating the workflow was refreshingly straightforward—just a YAML file in the .github/workflows
directory:
name: Terraform Apply
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Init
run: terraform init
- name: Terraform Format
run: terraform fmt -check
- name: Terraform Plan
run: terraform plan
- name: Terraform Apply
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
run: terraform apply -auto-approve
Within a day, the team had a working pipeline that was simpler, more reliable, and required zero infrastructure maintenance.
The Clear Benefits
As more workloads migrated to GitHub Actions, the advantages became increasingly obvious:
Zero infrastructure maintenance: No more patching servers, managing Docker hosts, or worrying about scaling the CI/CD infrastructure during busy periods.
Seamless GitHub integration: Pull request checks, branch protections, and deployment environments all tied directly into existing GitHub workflows.
Marketplace actions: Rather than writing complex scripts, teams could leverage pre-built actions for common tasks. Need to push a container to GCP Artifact Registry? There's an action for that. Want to run security scans on your Terraform code? There's an action for that too.
Matrix builds: Testing across multiple platforms and language versions became trivial:
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest]
node-version: [14.x, 16.x, 18.x]
steps:
- uses: actions/checkout@v3
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- run: npm ci
- run: npm test
Self-hosted runners on cloud VMs: For our specific workloads, we set up self-hosted runners on GCP and Azure VMs. This approach gave us complete control over the compute resources while still benefiting from GitHub Actions' orchestration capabilities. It also allowed us to:
Use custom VM sizes optimized for our specific workloads
Maintain runners in the same VPC as our other cloud resources for secure access
Implement auto-scaling for our runner pools based on workflow queue depth
Benefit from cloud provider discounts for committed-use VMs
Unified secret management: No more juggling credentials between different systems—GitHub's encrypted secrets worked across all workflows.
Real-World Workflows That Made a Difference
One of the most impactful implementations was a full end-to-end IaC pipeline that handled Terraform validation, security scanning, cost estimation, and deployment—all triggered automatically from pull requests:
name: Infrastructure Pipeline
on:
pull_request:
paths:
- 'terraform/**'
- '.github/workflows/terraform.yml'
jobs:
validate:
runs-on: self-hosted-cloud-vm
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Init
run: terraform init -backend=false
- name: Terraform Validate
run: terraform validate
security:
needs: validate
runs-on: self-hosted-cloud-vm
steps:
- uses: actions/checkout@v3
- name: Run tfsec
uses: aquasecurity/tfsec-action@v1.0.0
- name: Run checkov
uses: bridgecrewio/checkov-action@master
cost:
needs: security
runs-on: self-hosted-cloud-vm
steps:
- uses: actions/checkout@v3
- name: Setup Infracost
uses: infracost/actions/setup@v2
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- name: Generate Infracost cost estimate
run: infracost breakdown --path . --format json > infracost.json
- name: Post Infracost comment
uses: infracost/actions/comment@v2
with:
path: infracost.json
This workflow gave developers immediate feedback on their infrastructure changes, highlighting security issues and potential cost increases before code was even merged—a massive improvement over previous approaches.
For Azure workloads, a workflow was implemented that handled cross-cloud authentication, standardising the deployment process regardless of cloud provider:
name: Deploy to Azure
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: self-hosted-cloud-vm
steps:
- uses: actions/checkout@v3
- name: Azure Login
uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Init
run: terraform init
- name: Terraform Apply
run: terraform apply -auto-approve
Cloud VM Runners: Implementation Details
Our cloud VM runner setup evolved to become quite sophisticated:
VM provisioning automation: We used Terraform to manage the lifecycle of our runner VMs, with separate configurations for both GCP and Azure. This included:
Compute instance configurations
Networking and security settings
Identity and access management
Auto-scaling instance groups
Runner installation: A startup script automatically installed and configured the GitHub Actions runner software, registering the VM with the correct repositories or organizations.
VM image management: We maintained custom VM images with pre-installed dependencies to reduce startup time and ensure consistency.
Cost optimization strategies:
Spot/preemptible instances for non-critical workflows
Instance right-sizing based on workflow requirements
Auto-scaling based on GitHub workflow queue metrics
Scheduled scale-down during off-hours
Security considerations:
Ephemeral runners that reset after each job
Private networking with controlled egress
Just-in-time access for runners to cloud resources
Regular rotation of runner registration tokens
Challenges and Lessons Learned
Despite the many benefits, the transition wasn't without challenges:
Billing surprises: While GitHub Actions offers generous free minutes, larger projects quickly exceeded them. Setting up billing alerts and monitoring usage became essential. With our cloud VM approach, we needed to carefully monitor both GitHub billing and cloud provider costs.
Workflow organisation: As teams scaled to dozens of repositories, maintaining consistent workflows became challenging. Creating a central repository of reusable workflows and standardising approaches helped address this issue.
Complex dependencies: Some builds required unusual dependencies or large binary files. Leveraging GitHub's cache action helped avoid repeatedly downloading these resources. For our cloud VM runners, we created custom VM images with commonly used dependencies pre-installed.
Rate limiting: During peak development periods, teams occasionally hit API rate limits. Implementing thoughtful workflow triggers and consolidating jobs helped mitigate this.
VM provisioning delays: When auto-scaling kicked in, new runners sometimes took several minutes to come online. We implemented a warm pool of standby runners to mitigate this issue.
Who Should Try GitHub Actions
Based on this experience with GitHub Actions in production, here's who should consider making the switch:
Definitely try it if:
You're already using GitHub for source control
Your team is spending significant time maintaining CI/CD infrastructure
You want to democratise CI/CD and empower developers to own their pipelines
You need to test across multiple platforms or language versions
You're looking to standardise deployments across multiple cloud providers
You want the flexibility to run workflows on cloud VMs under your control
Proceed with caution if:
Your builds require very specialised hardware or extremely large resource allocations
You have existing investments in enterprise CI/CD tools with deep integration into other systems
Regulatory requirements demand complete control over the build infrastructure
Your workflows need to run in completely air-gapped environments
Conclusion: The Community Benefits
The most surprising outcome of GitHub Actions adoption wasn't just the technical improvements—it was the cultural shift. Developers who previously viewed CI/CD as "operations territory" began owning their pipelines, experimenting with optimisations, and sharing workflow improvements across teams.
Organizations typically see deployment frequency increase significantly, while the average time to recover from failed deployments drops dramatically. Code reviews become more meaningful as automated checks catch common issues before human reviewers see the code.
For platform engineers, GitHub Actions represents more than just another CI/CD tool—it's a step toward the self-service infrastructure that modern organizations need. By eliminating the maintenance overhead of traditional CI/CD systems, platform teams can focus on higher-value platform work that truly moves businesses forward.
Running GitHub Actions on cloud VMs offers the best of both worlds: the simplicity and integration of GitHub's platform with the control and customization of your own infrastructure. This hybrid approach has proven particularly valuable for teams with specialized compute requirements or strict security and compliance needs.
If you're drowning in CI/CD infrastructure maintenance or tired of fighting with opaque build systems, GitHub Actions is worth exploring. Start small, perhaps with a simple workflow alongside your existing pipeline, and see if it can deliver similar transformations to what many engineering teams have experienced.
That 3 AM incident response might become a thing of the past—and your on-call engineers will thank you for it.
Subscribe to my newsletter
Read articles from Joby directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
