Using GCP Secret Manager With Container-Optimized OS VMs
Using Google Cloud Platform Secret Manager service is a convenient way to store and manage secrets. It has many features like automatic rotation, audit logging, access control and many others. It is integrated with GCP services like Cloud Functions, Cloud Run, GKE (and probably some other), provides a REST API to access secrets and has a client library for many programming languages.
Sadly, there seems to be no easy way to use it with Google Compute Engine VMs, especially with Container-Optimized OS (COS) VM image, which I use quite often when dealing with containerized applications.
This is a guide on how to integrate GCP Secret Manager with Container-Optimized OS VM.
Assumptions
gcloud CLI tool is installed, and you are authenticated in GCP.
You have a fresh and empty project in GCP that you have permissions to manage (like owner or similar role).
Docker and OpenTofu is installed on your machine.
The plan
Infrastructure configuration will be kept in a Terraform modules and OpenTofu will be used to apply changes.
Terraform config will be split into two modules:
base
- infrastructure that do not change on deploy, like project services, artifact registry, VPC configuration, secret manager secret, service accounts, IAM permissions etc.deploy
- infrastructure that is applied on deploy, like VM configuration,cloudinit
and container specification etc.
Simple bash script will be created that reads secrets from file and prints them to
stdout
.Script will be containerized and pushed to Artifact Registry.
Applying Terraform
deploy
module will create a VM with Container-Optimized OS that will pull secrets from Secret Manager and run application container with secret mounted inside app container.While running, the application will generate logs that should contain secret contents.
Terraform setup
Let's create a new Terraform configuration inside .terraform/
directory at the root of the project. It should contain two sub-modules, base
and deploy
, and a root module configuration that will use both of them.
Base sub-module
Base infrastructure should contain all the necessary resources that will be used by the deploy
sub-module. Those resources should not change often. Base infrastructure contains:
Enabled project services like Secret Manager, Artifact Registry, Compute Engine API etc.
Service account that will be used as runtime identity for VM.
Secret manager secret with some secret value.
Artifact Registry repository to store application docker image.
IAM permissions for service account to access secret, app image in Artifact Registry, create VMs etc.
VPC network and subnet configuration with Google Private Access enabled to allow VM with private IP to access GCP resources.
Directory structure:
.terraform/base
├── artifact_registry.tf # Artifact Registry repository configuration
├── outputs.tf # Outputs definition, needed by deploy module
├── project_iam.tf # Project IAM permissions configuration
├── project_services.tf # Project services configuration
├── providers.tf # Required providers configuration
├── secret_manager.tf # Secret Manager secret configuration
├── variables.tf # Required variables definition
└── vpc.tf # VPC network and subnet configuration
.terraform/base/variables.tf
variable "project_id" {
description = "The GCP project ID"
}
variable "app_name" {
description = "The application name, e.g.: My App"
}
variable "app_slug" {
description = "The application slug, e.g. my-app"
}
variable "region" {
description = "The GCP region to use for subnetwork"
}
.terraform/base/providers.tf
Specify required providers and their versions.
terraform {
required_providers {
google = {
source = "hashicorp/google"
}
time = {
source = "hashicorp/time"
}
}
}
.terraform/base/project_services.tf
GCP services APIs need to be enabled in the project to be able to use them.
resource "google_project_service" "compute" {
project = var.project_id
service = "compute.googleapis.com"
disable_on_destroy = false
}
resource "google_project_service" "iam" {
project = var.project_id
service = "iam.googleapis.com"
disable_on_destroy = false
}
resource "google_project_service" "secretmanager" {
project = var.project_id
service = "secretmanager.googleapis.com"
disable_on_destroy = false
}
resource "google_project_service" "artifactregistry" {
project = var.project_id
service = "artifactregistry.googleapis.com"
disable_on_destroy = false
}
# Wait for services to be ready
resource "time_sleep" "services_ready" {
depends_on = [
google_project_service.compute,
google_project_service.iam,
google_project_service.secretmanager,
google_project_service.artifactregistry,
]
create_duration = "1m"
}
.terraform/base/project_iam.tf
# Service account for application
resource "google_service_account" "app_sa" {
project = var.project_id
account_id = var.app_slug
display_name = "My App Service Account"
depends_on = [time_sleep.services_ready]
}
# Allow application SA to write logs
resource "google_project_iam_member" "app_sa_monitoring_logWriter" {
project = var.project_id
role = "roles/logging.logWriter"
member = "serviceAccount:${google_service_account.app_sa.email}"
}
.terraform/base/artifact_registry.tf
# Create artifact registry repository for application Docker images
resource "google_artifact_registry_repository" "docker_repo" {
project = var.project_id
location = "europe"
repository_id = var.app_slug
description = "${var.app_name} Docker Repository"
format = "DOCKER"
depends_on = [time_sleep.services_ready]
}
# Grant the service account access to the repository
resource "google_artifact_registry_repository_iam_member" "my_app_sa_artifactregistry_reader" {
project = google_artifact_registry_repository.docker_repo.project
location = google_artifact_registry_repository.docker_repo.location
repository = google_artifact_registry_repository.docker_repo.repository_id
role = "roles/artifactregistry.reader"
member = "serviceAccount:${google_service_account.app_sa.email}"
}
.terraform/base/secret_manager.tf
# Create application secret
resource "google_secret_manager_secret" "secret_file" {
project = var.project_id
secret_id = "secret_file"
replication {
auto {}
}
depends_on = [time_sleep.services_ready]
}
# Grant the service account access to the secret
resource "google_secret_manager_secret_iam_member" "my_app_sa_secretAccessor_secret_file" {
project = google_secret_manager_secret.secret_file.project
secret_id = google_secret_manager_secret.secret_file.id
role = "roles/secretmanager.secretAccessor"
member = "serviceAccount:${google_service_account.app_sa.email}"
}
.terraform/base/outputs.tf
output "service_account_email" {
value = google_service_account.app_sa.email
}
output "secret_id" {
value = google_secret_manager_secret.secret_file.id
}
output "docker_registry_url" {
value = format(
"%s-docker.pkg.dev/%s/%s",
google_artifact_registry_repository.docker_repo.location,
google_artifact_registry_repository.docker_repo.project,
google_artifact_registry_repository.docker_repo.repository_id
)
}
output "network_self_link" {
value = google_compute_network.app.self_link
}
output "subnetwork_self_link" {
value = google_compute_subnetwork.app_subnet_1.self_link
}
Deploy sub-module
Deploy infrastructure should contain all the necessary resources that will be created or updated on each deploy. Base infrastructure contains:
cloud-init
configuration that will be used to set up VM instance including fetching secret and storing it on disk.Compute Engine VM instance using Container-Optimized OS image and container specification that will mount this secret to running container .
Directory structure:
.terraform/deploy
├── cloudinit.yaml.tpl
├── compute_engine.tf
├── providers.tf
└── variables.tf
.terraform/deploy/variables.tf
variable "project_id" {
description = "The GCP project ID"
}
variable "app_slug" {
description = "The application slug, e.g. my-app"
}
variable "app_image_url" {
description = "The URL of the Docker image"
}
variable "secret_id" {
description = "The secret ID, e.g. projects/my-project/secrets/my-secret"
}
variable "service_account_email" {
description = "The email of the service account"
}
variable "network_self_link" {
description = "The self link of the network"
}
variable "subnetwork_self_link" {
description = "The self link of the subnetwork"
}
variable "zone" {
description = "The GCP zone"
}
.terraform/deploy/providers.tf
terraform {
required_providers {
google = {
source = "hashicorp/google"
}
}
}
.terraform/deploy/cloudinit.yaml.tpl
Cloud-init is the industry standard multi-distribution method for cross-platform cloud instance initialization. It is supported across all major public cloud providers, provisioning systems for private cloud infrastructure, and bare-metal installations. The cloudinit.yaml.tpl
file is a template that will be used to generate cloud-config
file that will be used to:
create empty secret file on disk
create a script that reads secret from Secret Manager and writes it to empty secret file
configure VM instance to run this script on startup
The scrip itself is using GCP metadata server to fetch workload identity service account access token, then uses this token to fetch secret from Secret Manager and write it to /etc/secret_file
.
#cloud-config
write_files:
# Create empty file so it will always exist when mounting a volume to container.
# When source file or directory is missing, docker will create a root-owned directory with same path and then
# mount it inside container.
- path: /etc/secret_file
permissions: '0644'
owner: root
content: ''
# Script that will take care of fetching the secret from secret manager, using VM service account access token
- path: /etc/scripts/fetch-secret.sh
permissions: '0755'
owner: root
content: |
#!/bin/bash
# this script:
# 1. generates access token using google metadata server
# 2. fetches the secret version contents
# 3. decodes it and writes it to the /etc/secret_file
set -e
secret_version_id="${secret_id}/versions/${secret_version}"
access_token=$(
curl http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token \
-H 'Metadata-Flavor: Google' \
| cut -d'"' -f 4
)
curl "https://secretmanager.googleapis.com/v1/$secret_version_id:access?fields=payload.data&prettyPrint=false" \
-H "Authorization: Bearer $access_token" \
-H 'Accept: application/json' | cut -d '"' -f 6 | base64 -d > /etc/secret_file
runcmd:
- /etc/scripts/fetch-secret.sh
.terraform/deploy/compute_engine.tf
locals {
# Docker container specification
container_spec = {
spec = {
restartPolicy = "OnFailure"
# Volumes that will be mounted to the application container
volumes = [
{
name = "secret"
hostPath = {
# Path inside VM where the secret file is located
path = "/etc/secret_file"
}
},
]
containers = [{
name = var.app_slug
image = var.app_image_url
# Define what volumes to mount inside the container
volumeMounts = [
{
# Path inside application container to mount the secret file
mountPath = "/secret"
name = "secret"
readOnly = true
},
]
env = [
{
name = "MY_ENV_VAR"
value = "my-env-var-value"
},
]
}]
}
}
}
# Get the latest stable Container-Optimized OS image data
data "google_compute_image" "cos" {
family = "cos-stable"
project = "cos-cloud"
}
# Generate a cloud-init configuration based on template file `cloudinit.yaml.tpl`.
# It will be used to setup the VM.
data "template_file" "cloudinit" {
template = file("${path.module}/cloudinit.yaml.tpl")
vars = {
project_id = var.project_id
secret_id = var.secret_id
secret_version = "latest"
}
}
locals {
# Container specification need to be passed in yaml format in VM metadata
container_spec_yaml = yamlencode(local.container_spec)
# MD5 hash made from container spec and cloud-init file content.
machine_config_hash = md5(
format(
"%s-%s",
local.container_spec_yaml,
data.template_file.cloudinit.rendered
)
)
# Use config hash inside instance name to ensure that the instance is recreated when the configuration changes.
compute_instance_name = format("%s-%s", var.app_slug, substr(local.machine_config_hash, 0, 5))
}
# Create a VM instance for the application
resource "google_compute_instance" "my_app_vm" {
project = var.project_id
name = local.compute_instance_name
machine_type = "f1-micro"
zone = var.zone
allow_stopping_for_update = true
boot_disk {
initialize_params {
image = data.google_compute_image.cos.self_link
size = 10
type = "pd-ssd"
}
}
metadata = {
google-logging-enabled = "true"
# Define containers spec
gce-container-declaration = local.container_spec_yaml
enable-oslogin = "TRUE"
# Set cloud-init file contents
user-data = data.template_file.cloudinit.rendered
block-project-ssh-keys = true
}
# Setup network interface for the VM, it will only have a private IP address
network_interface {
network = var.network_self_link
subnetwork = var.subnetwork_self_link
}
# Set service account for the VM to identify itself to other services in GCP
service_account {
scopes = ["cloud-platform"]
email = var.service_account_email
}
}
Root Terraform module
The root module will use both base
and deploy
sub-modules to create the whole infrastructure.
Directory structure should look like this: Structure should look like this:
.terraform # Root of the terraform project
├── base # Base submodule directory
├── deploy # Deploy submodule directory
├── main.tf # Main terraform configuration which uses base and deploy submodules
├── terraform.tfvars # Terraform variables values file for conveniance (not tracked by GIT)
└── variables.tf # Terraform variables definition file
.terraform/variables.tf
variable "project_id" {
description = "The GCP project ID"
}
variable "app_name" {
description = "The application name, e.g.: My App"
}
variable "app_slug" {
description = "The application slug, e.g. my-app"
}
variable "region" {
description = "The GCP region"
}
variable "zone" {
description = "The GCP zone"
}
.terraform/main.tf
provider "google" {
project = var.project_id
}
# Base infrastructure
module "base" {
source = "./base"
project_id = var.project_id
app_name = var.app_name
app_slug = var.app_slug
region = var.region
}
# Deploy infrastructure
module "deploy" {
source = "./deploy"
project_id = var.project_id
app_slug = var.app_slug
app_image_url = "${module.base.docker_registry_url}/${var.app_slug}:latest"
secret_id = module.base.secret_id
service_account_email = module.base.service_account_email
zone = var.zone
network_self_link = module.base.network_self_link
subnetwork_self_link = module.base.subnetwork_self_link
}
# Output the Docker image tag so it can be used when building the image
output "docker_image_tag" {
value = format("%s/my-app:latest", module.base.docker_registry_url)
}
.terraform/terraform.tfvars
The terraform.tfvars
file will contain all the necessary variable values that will be used when applying changes. It is done to avoid specifying variables on the command line each time when applying changes. This file should not be tracked by VCS and is used only for convenience.
project_id = "my-app-project"
app_name = "My App"
app_slug = "my-app"
region = "europe-west1"
zone = "europe-west1-b"
Apply base infrastructure
tofu apply -target module.base
Create Dockerfile
in project root
It will be based on stable Debian image, create a script "inline" that reads secret from file and prints it to stdout
. This script will be used as default container command to run.
# syntax=docker/dockerfile:1
FROM debian:stable-slim
COPY <<EOF /app.sh
#!/usr/bin/env bash
set -Eeuo pipefail
while true; do
[[ -f /secret ]] && echo "Secret file contains: $(cat /secret)" || echo "Secret file does not exist!";
sleep 30;
done
EOF
RUN chmod +x /app.sh
CMD ["/app.sh"]
Build and push application image
docker build -t europe-docker.pkg.dev/my-app-project/my-app/my-app:latest .
docker push europe-docker.pkg.dev/my-app-project/my-app/my-app:latest
Create secret version
When applying base sub-module, a secret was created in Secret Manager. However, it is only a "container" for secret versions and by itself does not represent any secret value. We should create a secret version that contains actual secret value.
printf "my-secret-value" | gcloud secrets versions add secret_file --data-file=- --project=my-app-project
Apply deploy infrastructure
tofu apply -target module.deploy
Check logs
After the VM is created, it should start the container and print the secret value to stdout
every 30 seconds. We should wait few minutes and then read the logs. Before that however, we need to find the instance ID.
$ gcloud compute instances list --format="table(name, id)" --project=my-app-project
NAME ID
my-app-38ec6 2322076665855158865
Now we can read the logs to confirm all works as expected:
$ gcloud logging read 'resource.type="gce_instance" resource.labels.instance_id="2322076665855158865"' \
--project=my-app-project \
--format="table(timestamp, jsonPayload.message)" \
--limit 5
TIMESTAMP MESSAGE
2024-11-04T14:21:31.613754143Z Secret file contains: my-secret-value
2024-11-04T14:21:01.610290064Z Secret file contains: my-secret-value
2024-11-04T14:20:31.605553541Z Secret file contains: my-secret-value
2024-11-04T14:20:01.602425176Z Secret file contains: my-secret-value
2024-11-04T14:19:31.597518758Z Secret file contains: my-secret-value
Code of this guide is located at piotrekkr/blog-using-gcp-secret-manager-with-cos-vm repo.
Subscribe to my newsletter
Read articles from Piotrekkr directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by