HashiCorp Vault Cluster Setup with Raft Backend, Nginx Reverse Proxy with Keepalived

Introduction

In this article, we will set up a HashiCorp Vault cluster using the Raft backend, accessed via an Nginx reverse proxy, and ensure high availability for the Nginx layer with Keepalived.

My Commentary: This introduction clearly states the objective: building a highly available (HA) and resilient HashiCorp Vault setup.

  • HashiCorp Vault: For those unfamiliar, Vault is a tool for securely storing, managing, and accessing secrets (API keys, passwords, certificates, etc.). It's crucial for modern, secure application environments.

  • Raft Backend: This is Vault's built-in consensus mechanism for high availability, eliminating the need for external dependencies like Consul or PostgreSQL for the storage backend. This simplifies the architecture for HA considerably.

  • Nginx Reverse Proxy: Nginx will act as the public-facing endpoint, forwarding requests to the Vault cluster. This allows for SSL termination, basic load balancing, and can add an extra layer of security.

  • Keepalived: This is key for Nginx's high availability. Keepalived implements VRRP (Virtual Router Redundancy Protocol) to provide a floating IP (Virtual IP or VIP). If the primary Nginx server fails, Keepalived automatically moves the VIP to a healthy backup Nginx server, ensuring continuous service.

This combination of technologies is a common and robust pattern for providing HA to critical services like Vault.


Prerequisites

  • Docker

  • Docker Compose

  • Git

  • Make

My Commentary: These prerequisites suggest a local development or testing environment.

  • Docker & Docker Compose: Essential for containerizing Vault, Nginx, and Keepalived, making the setup reproducible and isolated. This is excellent for demonstration and rapid prototyping.

  • Git & Make: Used for cloning the repository and automating build/run processes, typical for DevOps workflows.

For a production environment, one would typically move beyond simple Docker Compose. Consider:

  • Kubernetes/OpenShift: For orchestrating containers at scale, providing built-in HA, self-healing, and service discovery.

  • Infrastructure as Code (IaC): Tools like Terraform for provisioning underlying infrastructure (VMs, networks, load balancers).

  • Cloud-Native Solutions: Utilizing cloud-specific load balancers (AWS ELB/ALB, Azure Load Balancer, GCP Load Balancer) for the Nginx layer, which offer managed HA and scalability out-of-the-box.

  • Secrets Management for Vault itself: How will the initial root token and unseal keys be handled securely?


Vault Server Configuration (vault.hcl)

Vault servers are configured using an HCL (HashiCorp Configuration Language) file. Here’s an example vault.hcl:

storage "raft" {
  path    = "/vault/data"
  node_id = "node1" # This should be unique for each node
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  tls_disable = "true" # IMPORTANT: Only for development/testing!
}

cluster_addr = "http://node1:8201" # This should be unique for each node
api_addr     = "http://node1:8200" # This should be unique for each node

My Commentary: This is the core configuration for a Vault server in a Raft cluster.

  • storage "raft":

    • path: Specifies where Raft data (Vault's operational data, state) will be stored persistently. This must be mapped to a persistent volume outside the container in a production setup to prevent data loss on container restart or deletion.

    • node_id: Crucially, this node_id must be unique for each Vault instance in the cluster. The article mentions node1 but implies it for a single example. In a 3-node cluster, you'd have node1, node2, node3.

    • Raft provides strong consistency and self-healing. It requires a quorum (majority) of nodes to be healthy for writes to proceed. For a cluster of N nodes, you need (N/2) + 1 healthy nodes. Common cluster sizes are 3 or 5 nodes.

  • listener "tcp":

    • address: 0.0.0.0:8200 means Vault listens on all available network interfaces on port 8200.

    • tls_disable = "true": This is a critical security warning! As the comment states, this is only for development or scenarios where an external component (like Nginx in this case) handles TLS. In any production environment, Vault should always use TLS directly (tls_disable = "false") with proper certificates. Even if Nginx handles external TLS, communication between Nginx and Vault, and between Vault nodes, should ideally be TLS-encrypted for defense in depth.

  • cluster_addr: The address Vault uses to communicate with other Vault nodes in the Raft cluster. This is essential for inter-node communication and Raft consensus. It's often on a dedicated "cluster" port (e.g., 8201). Again, for each node, this should point to its own unique address.

  • api_addr: The address where the Vault API is exposed. Clients (and the Nginx proxy) will connect to this address. Also, for each node, this should point to its own unique address.

The node_id, cluster_addr, and api_addr will need to be dynamically set for each Vault container, which Docker Compose can help with.


Vault Client Configuration for Auto-Unseal (client_vault.hcl)

Vault can be configured for auto-unseal using cloud-native Key Management Services (KMS) like AWS KMS, Azure Key Vault, GCP KMS, or HashiCorp's own Transit Secrets Engine. This removes the manual unsealing step, crucial for automated deployments and recovery.

# This section demonstrates AWS KMS for auto-unseal
seal "awskms" {
  region     = "eu-west-1"
  kms_key_id = "your-kms-key-id" # Replace with your actual KMS key ID
}

My Commentary:

  • Auto-Unseal: This is a fantastic feature for production Vault deployments. When Vault starts, it's in a "sealed" state, meaning it cannot access its data. Manual unsealing requires providing a threshold of unseal keys. Auto-unseal offloads this to a trusted KMS service, making the process seamless and automated, especially after restarts or outages.

  • seal "awskms": The example shows AWS KMS. You'd need to configure the Vault server's IAM role (or credentials) to allow it to interact with the specified KMS key.

  • Alternatives:

    • Azure Key Vault: seal "azurekeyvault"

    • Google Cloud KMS: seal "gcpckms"

    • HashiCorp Transit Secrets Engine: seal "transit" (Requires another Vault cluster or dedicated instance for this, often used in multi-cluster scenarios or for air-gapped environments).

    • HashiCorp Cloud Platform (HCP) Vault: For managed Vault, auto-unseal is handled automatically.

This client_vault.hcl snippet would be merged into the main vault.hcl or provided as an additional configuration snippet to the Vault server.


Docker Compose Setup

Here's the docker-compose.yml file to orchestrate the services:

version: '3.8'

services:
  vault1:
    image: hashicorp/vault:1.15.2
    container_name: vault1
    cap_add:
      - IPC_LOCK
    ports:
      - "8200:8200"
      - "8201:8201"
    environment:
      VAULT_ADDR: "http://0.0.0.0:8200"
      VAULT_API_ADDR: "http://vault1:8200"
      VAULT_CLUSTER_ADDR: "http://vault1:8201"
      VAULT_LOG_LEVEL: "info"
    volumes:
      - ./vault/config/vault1.hcl:/vault/config/vault.hcl # Mount config for each node
      - ./vault/data1:/vault/data # Mount persistent data volume for each node
    networks:
      - vault_network
    command: "server -config=/vault/config/vault.hcl"

  # vault2 and vault3 would be similar, with unique node_id, data paths, and container names/hostnames
  vault2:
    image: hashicorp/vault:1.15.2
    container_name: vault2
    cap_add:
      - IPC_LOCK
    ports:
      - "8202:8200" # Exposing on different host port for local access if needed
      - "8203:8201"
    environment:
      VAULT_ADDR: "http://0.0.0.0:8200"
      VAULT_API_ADDR: "http://vault2:8200"
      VAULT_CLUSTER_ADDR: "http://vault2:8201"
      VAULT_LOG_LEVEL: "info"
    volumes:
      - ./vault/config/vault2.hcl:/vault/config/vault.hcl
      - ./vault/data2:/vault/data
    networks:
      - vault_network
    command: "server -config=/vault/config/vault.hcl"
    depends_on:
      - vault1 # Simple dependency, not for HA

  vault3:
    image: hashicorp/vault:1.15.2
    container_name: vault3
    cap_add:
      - IPC_LOCK
    ports:
      - "8204:8200"
      - "8205:8201"
    environment:
      VAULT_ADDR: "http://0.0.0.0:8200"
      VAULT_API_ADDR: "http://vault3:8200"
      VAULT_CLUSTER_ADDR: "http://vault3:8201"
      VAULT_LOG_LEVEL: "info"
    volumes:
      - ./vault/config/vault3.hcl:/vault/config/vault.hcl
      - ./vault/data3:/vault/data
    networks:
      - vault_network
    command: "server -config=/vault/config/vault.hcl"
    depends_on:
      - vault1 # Simple dependency, not for HA

  nginx1:
    image: nginx:latest
    container_name: nginx1
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro # For SSL certificates
    networks:
      - vault_network
    depends_on:
      - vault1 # Nginx depends on at least one Vault node to start

  nginx2:
    image: nginx:latest
    container_name: nginx2
    ports:
      # Nginx2 will likely not have 80/443 exposed directly on host, Keepalived handles VIP
      # but if you need to access it directly for testing, you could map different ports
      # - "81:80"
      # - "444:443"
      # For Keepalived VIP, these ports are usually not mapped to host
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
    networks:
      - vault_network
    depends_on:
      - vault1

  keepalived:
    image: osixia/keepalived:latest
    container_name: keepalived
    cap_add:
      - NET_ADMIN # Required for VIP management
      - NET_BROADCAST
      - NET_RAW
    environment:
      KEEPALIVED_STATE: MASTER # For the first instance, the other would be BACKUP
      KEEPALIVED_INTERFACE: eth0 # Or the correct network interface inside the container
      KEEPALIVED_VIRTUAL_IPS: "172.18.0.100/24" # Example VIP, adjust subnet
      KEEPALIVED_UNICAST_PEERS: "172.18.0.x,172.18.0.y" # IPs of other keepalived containers
      KEEPALIVED_PASSWORD: "your_vrrp_password" # Important for security
      KEEPALIVED_PRIORITY: "101" # Higher for MASTER
      KEEPALIVED_VIRTUAL_ROUTER_ID: "51" # Unique ID for VRRP instance
    volumes:
      - ./keepalived/keepalived.conf:/etc/keepalived/keepalived.conf:ro # Custom config if needed
    networks:
      - vault_network
    sysctls:
      - net.ipv4.ip_nonlocal_bind=1 # Allow binding to non-local IP (VIP)
    depends_on:
      - nginx1
      - nginx2

networks:
  vault_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.18.0.0/24 # Example subnet

My Commentary: This docker-compose.yml provides a comprehensive blueprint.

  • Vault Services (vault1, vault2, vault3):

    • image: hashicorp/vault:1.15.2: Using a specific, stable version is good practice.

    • cap_add: - IPC_LOCK: Essential for Vault. It prevents Vault from swapping sensitive data to disk, improving security.

    • ports: Mapping container ports to host ports. For vault2 and vault3, the host ports 8202:8200 and 8203:8201 (etc.) are for local host access if you want to curl each Vault node directly. In a real-world scenario, you might not expose these ports directly on the host if Nginx is the sole entry point. Internal communication within vault_network uses the container names (vault1:8200).

    • environment: Dynamically sets VAULT_API_ADDR and VAULT_CLUSTER_ADDR to the container's own hostname. This is critical for inter-Vault communication and how Nginx finds them.

    • volumes:

      • ./vault/config/vaultX.hcl:/vault/config/vault.hcl: Each Vault container gets its specific configuration file, ensuring node_id and addresses are correctly set for that instance.

      • ./vault/dataX:/vault/data: Crucial for persistence. This maps the Vault data directory outside the container to a named volume or host path. Without this, all Vault data would be lost on container removal. In production, this would be a highly available shared storage solution or cloud block storage.

    • networks: - vault_network: All services are on the same bridge network, allowing them to communicate by container name (e.g., nginx1 can reach vault1).

    • depends_on: Simple startup order, not for ensuring HA. If vault1 dies, vault2 and vault3 won't restart automatically due to depends_on. Proper orchestration (Kubernetes) handles this.

  • Nginx Services (nginx1, nginx2):

    • ports: - "80:80" - "443:443": Nginx listens on standard HTTP/HTTPS ports. Note that for nginx2 (the backup), these ports might not be directly mapped to the host if Keepalived is managing the VIP. Only the Master Nginx will have the VIP bound.

    • volumes: Mounting nginx.conf and ssl directories for configuration and certificates. For production, replace placeholder SSL certs with real, trusted ones (Let's Encrypt, commercial CAs).

    • depends_on: Nginx needs Vault to be up, but again, this is basic.

  • Keepalived Service:

    • image: osixia/keepalived:latest: A convenient pre-built Keepalived container.

    • cap_add: - NET_ADMIN, - NET_BROADCAST, - NET_RAW: These capabilities are absolutely necessary for Keepalived to manage network interfaces and IPs (like the VIP).

    • environment:

      • KEEPALIVED_STATE: MASTER for the primary, BACKUP for the secondary.

      • KEEPALIVED_INTERFACE: eth0 is a common default for Docker bridge networks. Verify it's the correct interface inside the container.

      • KEEPALIVED_VIRTUAL_IPS: The Virtual IP (VIP) that will float between the Nginx instances. This is the single entry point for clients.

      • KEEPALIVED_UNICAST_PEERS: Crucial. The internal network IPs of the other Keepalived containers. This is how Keepalived instances find and communicate with each other (VRRP heartbeat). This will be 172.18.0.X based on your Docker network.

      • KEEPALIVED_PASSWORD: Important for securing VRRP communication.

      • KEEPALIVED_PRIORITY: Higher value for the desired MASTER.

      • KEEPALIVED_VIRTUAL_ROUTER_ID: Unique ID for the VRRP instance within the network segment.

    • sysctls: - net.ipv4.ip_nonlocal_bind=1: Allows Keepalived to bind to an IP address that isn't directly configured on the network interface (the VIP).

    • depends_on: Keepalived needs Nginx to be up to perform health checks.

  • networks: Defining a custom bridge network provides better isolation and allows using service names for internal communication.

This setup creates a robust, containerized environment for demonstration. For production, consider:

  • Persistent Storage: More robust solutions than host mounts (e.g., Docker volumes managed by a volume plugin, NFS, cloud block storage).

  • Networking: Dedicated internal networks, possibly without port mapping to the host for Vault, relying solely on Nginx as the gateway.

  • Security: Stronger firewalls, network ACLs, TLS everywhere.

  • Monitoring & Alerting: Integration with Prometheus, Grafana, Alertmanager to track the health of Vault, Nginx, and Keepalived.

  • Secrets Management for Setup: How will the KMS credentials for auto-unseal be provided securely to the Vault containers?


Vault Initialization & Unseal

Once the Vault containers are running, you need to initialize the cluster. This is typically done from one of the Vault containers:

docker exec vault1 vault operator init -key-shares=3 -key-threshold=2 -format=json > cluster_keys.json

My Commentary:

  • vault operator init: This command performs the initial setup of the Vault cluster.

    • -key-shares=3: Generates 3 unseal keys.

    • -key-threshold=2: Requires 2 of these keys to unseal Vault. This adheres to the "N of M" security principle.

    • -format=json > cluster_keys.json: Outputs the root token and unseal keys to a JSON file.

CRITICAL SECURITY WARNING:

  • Securely store cluster_keys.json: The root token and unseal keys are the "keys to the kingdom." Do NOT leave this file on the server or in version control. These should be distributed securely to trusted individuals (e.g., using a secure key management system, physical safe, or split knowledge).

  • Recovery Keys: The unseal_keys_b64 are also called recovery keys. These are needed if auto-unseal fails or if you need to manually unseal.

  • Root Token: This token has full administrative privileges. Use it only for initial setup and creating a less privileged admin account. Then, revoke the root token.

  • Unsealing (Manual - if not using auto-unseal): If auto-unseal is not configured, you would manually unseal each Vault node after initialization:

      docker exec vault1 vault operator unseal <key_from_cluster_keys.json>
      docker exec vault1 vault operator unseal <second_key_from_cluster_keys.json>
      # Repeat for other nodes (vault2, vault3) using the same keys
    

    With auto-unseal via KMS, this manual step is largely eliminated, which is a major advantage for production. The seal "awskms" configuration snippet would handle this automatically on startup.


Nginx Configuration (nginx.conf)

This configuration enables Nginx to act as a reverse proxy for the Vault cluster, handling SSL/TLS.

http {
    upstream vault_servers {
        server vault1:8200;
        server vault2:8200;
        server vault3:8200;
        # You can add load balancing algorithms here, e.g., least_conn, ip_hash
    }

    server {
        listen 80;
        server_name your.vault.domain.com; # Replace with your domain
        return 301 https://$host$request_uri; # Redirect HTTP to HTTPS
    }

    server {
        listen 443 ssl;
        server_name your.vault.domain.com; # Replace with your domain

        ssl_certificate /etc/nginx/ssl/vault.crt; # Your SSL certificate
        ssl_certificate_key /etc/nginx/ssl/vault.key; # Your SSL private key
        ssl_protocols TLSv1.2 TLSv1.3; # Enforce strong protocols
        ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH"; # Strong ciphers
        ssl_prefer_server_ciphers on;
        ssl_session_cache shared:SSL:10m;
        ssl_session_timeout 10m;

        location / {
            proxy_pass http://vault_servers; # Proxy to the upstream Vault cluster
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_connect_timeout 600;
            proxy_send_timeout 600;
            proxy_read_timeout 600;
            send_timeout 600;
        }
    }
}

My Commentary: This nginx.conf is a solid starting point for proxying Vault.

  • upstream vault_servers:

    • Defines a group of backend Vault servers. Nginx will automatically use a round-robin load balancing algorithm by default if no other is specified.

    • Recommendation: For Vault, which relies on a single active leader, least_conn or ip_hash might sometimes be preferred, though simple round-robin works too. Vault clients are generally smart enough to retry if they hit a standby.

  • HTTP to HTTPS Redirect (listen 80 block): Excellent practice for security. All HTTP traffic is forced to HTTPS.

  • HTTPS Server Block (listen 443 ssl block):

    • ssl_certificate / ssl_certificate_key: Replace these with your actual production-ready SSL certificates and keys. Never use self-signed certificates in production. Ensure these files are properly secured on the host.

    • ssl_protocols / ssl_ciphers: Enforcing TLSv1.2 and TLSv1.3 with strong ciphers is crucial for modern security best practices.

  • location / block:

    • proxy_pass http://vault_servers;: This is the core instruction, forwarding all requests to the vault_servers upstream group.

    • proxy_set_header: These headers are vital for Vault (and any proxied application) to correctly identify the original client's IP, hostname, and protocol. Without them, Vault would see Nginx's IP as the client.

    • proxy_connect_timeout, etc.: These timeouts can be adjusted based on expected Vault response times, especially for long-running operations or during heavy load. 600 seconds (10 minutes) might be quite high; consider if such long requests are expected for Vault.

  • Security Considerations for Nginx:

    • WAF (Web Application Firewall): For production, consider integrating a WAF to protect against common web exploits.

    • Rate Limiting: Implement Nginx rate limiting to prevent abuse or denial-of-service attacks.

    • Access Control: Add allow/deny rules if access should be restricted to specific IP ranges.

    • Logging: Configure comprehensive logging for auditing and troubleshooting.


Keepalived Configuration (keepalived.conf)

Keepalived provides high availability for the Nginx instances by using VRRP to manage a floating IP address.

vrrp_script check_nginx {
    script "killall -0 nginx" # Checks if nginx process is running
    interval 2 # Check every 2 seconds
    weight 50 # If script fails, priority decreases by 50
}

vrrp_instance VI_1 {
    state MASTER # For the primary nginx instance, set to BACKUP for the secondary
    interface eth0 # The network interface Keepalived will monitor
    virtual_router_id 51 # Unique ID for this VRRP instance
    priority 101 # Higher priority for MASTER, e.g., 100 for BACKUP
    advert_int 1 # Advertisement interval in seconds
    authentication {
        auth_type PASS
        auth_pass your_vrrp_password # Must match across all Keepalived instances
    }
    virtual_ipaddress {
        172.18.0.100/24 # The Virtual IP address
    }
    track_script {
        check_nginx # Link to the script defined above
    }
    notify_master "/etc/keepalived/notify.sh master"
    notify_backup "/etc/keepalived/notify.sh backup"
    notify_fault "/etc/keepalived/notify.sh fault"
}

My Commentary: This Keepalived configuration is standard for a simple active-passive (or active-backup) HA setup.

  • vrrp_script check_nginx:

    • script "killall -0 nginx": This is a basic health check. It simply verifies if the nginx process is running.

    • Improvement: For a more robust setup, the script should check if Nginx is actually serving Vault requests correctly, not just if the process exists. A curl -sL http://localhost/vault/v1/sys/health or similar (adjusted for port) would be much better, possibly checking for a 200 OK status. If Nginx is running but unable to talk to Vault, Keepalived wouldn't failover.

    • interval: How often the script runs.

    • weight: If the script fails, this value is subtracted from the priority, potentially causing a failover.

  • vrrp_instance VI_1:

    • state: MASTER for the active node, BACKUP for the standby. Keepalived uses priority to determine the actual master.

    • interface: The network interface on which the VRRP heartbeat and VIP will operate. It must match the Docker container's network interface (e.g., eth0).

    • virtual_router_id: A unique identifier for this VRRP instance. All Keepalived instances managing the same VIP must share this ID.

    • priority: Determines which node becomes master. Higher priority wins. If the master fails, the next highest priority backup takes over.

    • advert_int: How often VRRP advertisements (heartbeats) are sent.

    • authentication: Crucial for security. Prevents unauthorized machines from joining your VRRP group. PASS is a simple password; consider using AH for stronger authentication in production.

    • virtual_ipaddress: The floating IP address (VIP). This is the address clients will use to connect to your highly available service.

    • track_script check_nginx: Links the health check script to the VRRP instance, triggering failover if the script fails.

    • notify_master, notify_backup, notify_fault: These are shell scripts that can be executed when the Keepalived instance changes state. Useful for logging, sending alerts (e.g., to Slack, PagerDuty), or performing other actions during failover events.

Overall for Keepalived:

  • This setup ensures that if the primary Nginx server (or its Nginx process) goes down, the VIP will automatically move to the backup Nginx, providing seamless failover.

  • Consider a 3-node Nginx/Keepalived setup for true fault tolerance: A 2-node setup (master/backup) works, but if the master goes down and the backup also fails before the master recovers, you're out of service. A 3-node (or more) setup with proper health checks and priority management can provide more robust resilience.

  • Placement: In a VM environment, ensure Nginx/Keepalived pairs are on different physical hosts for true HA. In Docker Compose, they're on the same host unless you deploy them across multiple Docker Swarm/Kubernetes nodes.


Testing

To verify the setup, you can check the status of Vault and Nginx.

Vault Status:

docker exec vault1 vault status

You should see output similar to this, indicating the cluster is initialized, sealed (if not auto-unsealed), and showing the leader.

Key                         Value
---                         -----
Seal Type                   shamir
Initialized                 true
Sealed                      false
Total Shares                3
Threshold                   2
Version                     1.15.2
Build Date                  2023-11-20T12:35:48Z
Storage Type                raft
Cluster Name                vault-cluster-d6d7e0d7
Cluster ID                  2430ae1c-2234-7a32-1b1a-8252277d0180
HA Enabled                  true
HA Cluster                  https://vault1:8201 # This will vary based on your env
HA Mode                     active
Active Since                2023-12-01T10:00:00Z

Nginx Status (via VIP): Access your configured your.vault.domain.com (or the VIP directly) in your browser or with curl.

curl -k https://your.vault.domain.com/v1/sys/health

You should get a JSON response indicating Vault's health status. The -k flag is important if you're using self-signed certificates for testing.

My Commentary:

  • vault status: This is a fundamental check. Pay attention to:

    • Sealed: Should be false if auto-unseal is working.

    • HA Enabled: Should be true.

    • HA Mode: One node should be active (the leader), others standby.

    • Storage Type: Should be raft.

  • Nginx/VIP Testing:

    • Failover Test: The most important test is to simulate a failure.

      • Stop the nginx1 container (docker stop nginx1). Observe the Keepalived logs (if running keepalived in a separate terminal) to see the failover.

      • Verify that your.vault.domain.com (or the VIP) still responds, now served by nginx2.

      • Restart nginx1 and observe if it correctly takes back the master role (preemption) or if nginx2 remains master.

    • Vault Node Failure Test:

      • Stop the vault1 container (docker stop vault1).

      • Verify that the Vault cluster remains healthy (if you have 3 nodes, you still have a quorum). vault status on vault2 or vault3 should show them as active/standby.

      • Ensure Nginx can still proxy to the remaining healthy Vault nodes.

    • Secrets Test: Create a secret via the VIP (vault kv put secret/test value=hello), then try to read it (vault kv get secret/test). This verifies end-to-end functionality.


Conclusion

By following these steps, you can set up a highly available HashiCorp Vault cluster using the Raft backend, accessed securely via an Nginx reverse proxy with Keepalived ensuring Nginx's high availability. This robust architecture provides a solid foundation for managing your secrets in a resilient manner.

My Final Commentary: This article provides a solid, practical guide for setting up a high-availability Vault cluster using a common pattern. It's excellent for understanding the components and their interaction.

Beyond this setup, for full production readiness, consider:

  1. Observability:

    • Monitoring: Collect metrics from Vault (using telemetry stanza in vault.hcl), Nginx, and Keepalived (e.g., using Prometheus Node Exporter) to track performance, health, and potential issues.

    • Logging: Centralize logs (e.g., ELK Stack, Splunk, Loki) for easier troubleshooting and auditing.

    • Alerting: Set up alerts based on key metrics (e.g., Vault sealed, Nginx down, high error rates).

  2. Security:

    • Network Segmentation: Isolate Vault and its backend storage in dedicated, private network segments.

    • Firewalls: Implement strict firewall rules (Security Groups, Network ACLs) to limit access to Vault and its components.

    • Vault Policies and Authentication: Define granular policies and use appropriate authentication methods (e.g., LDAP, Kubernetes Auth Method, AWS/Azure/GCP Auth Methods) for users and applications accessing secrets.

    • Audit Logging: Enable and review Vault's audit logs for compliance and security monitoring.

  3. Operations:

    • Automated Deployment: Use IaC tools (Terraform, Ansible, Chef, Puppet) or Kubernetes/Helm charts for consistent, automated deployment and management of the entire stack.

    • Backup and Restore: Implement a robust backup and restore strategy for Vault's Raft data.

    • Disaster Recovery: Plan and regularly test for regional outages or major failures.

    • Upgrades: Have a strategy for upgrading Vault, Nginx, and Keepalived with minimal downtime.

    • Secrets Rotation: Automate the rotation of secrets where possible (e.g., database credentials).

This architecture provides a strong, highly available secrets management solution, crucial for any modern secure infrastructure.

0
Subscribe to my newsletter

Read articles from Serdarcan Büyükdereli directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Serdarcan Büyükdereli
Serdarcan Büyükdereli

Senior DevOps Engineer | Building scalable, reliable infrastructures | Automation, Cloud, CI/CD | Performance & security-focused 🚀