Setting up infrastructure from scratch is often romanticized — until you're face to face with a blinking cursor and nothing installed. This post walks through how I built my initial setup manually, tested everything hands-on, and prepared the system for a future IaC-based rebuild.

1. Server & DNS Setup

Hetzner VPS (CX22)

Ubuntu base image
Clean slate environment — no preinstalled surprises

Cloudflare DNS (Proxy Mode)

DNS with built-in DDoS mitigation
IP masking and active edge protection

2. Core Stack: Half-Manual, Half-IaC

Instead of full automation from day one, I opted for a hybrid approach: docker-compose files organized in a central repo, with services running entirely in Docker for simplicity and consistency.

Deployment Approach

Initial deployments used GitHub Actions, but due to permission complexity and audit concerns, I switched to manual root@vps deployments. Not elegant, but stable and predictable.

Main Components

Traefik Reverse Proxy

Automatic HTTPS via Let’s Encrypt
Dynamic subdomain routing
Docker integration
Dashboard: http://localhost:9000

Cloudflare Tunnel

One tunnel for public-facing services
One tunnel for internal/admin interfaces
All sensitive admin UIs are accessible only through tunnel routing

Shared Services

PostgreSQL and Temporal
Shared across all PoC apps, with future isolation options for scaling

Monitoring Stack (Separate blog post coming soon)

Prometheus, Grafana, Loki, Promtail
Host metrics via node-exporter, container metrics via cAdvisor
Logs collected via Docker socket (not log drivers)
Preconfigured Grafana dashboards at http://localhost:9120
Ready for AI-based log analysis and debugging bottlenecks

3. Debug-Driven DevOps

I tested and debugged every service manually in the terminal — not just to make it work, but to understand why it works (or fails). This included:

Verifying Traefik certificate renewals
Troubleshooting Cloudflare tunnel and DNS behavior
Dealing with Loki label configuration and Promtail edge cases

4. Snapshot → Wipe → Rebuild

After validating the entire stack, I created a snapshot and wiped the server. Clean state, no cruft.

What’s next?

Rebuilding the same infrastructure with Pulumi, with Kubernetes likely following later. For now, the goal is controlled complexity — and a better understanding of where automation makes sense.

TL;DR

Built an initial infrastructure setup with:

Traefik for HTTPS and routing
Docker-based monitoring stack (Prometheus, Grafana, Loki)
Temporal for orchestrated background workflows
PostgreSQL for shared persistence
Cloudflare Tunnel for secure admin access
Manual root deployments (IaC in progress)

Next step: Pulumi-based automation, and eventually Kubernetes — but only when the payoff outweighs the extra complexity.

My First Infrastructure Skeleton: From Manual Pain to IaC Sanity