Real-World Azure Notes: “Kubernetes the Hard Way”—Troubles, Tweaks & Triumphs


Azure Notes”/“Real-World Lessons

If you’re reading this, you’ve probably just finished (or are halfway through) Kubernetes the Hard Way and are running into snags unique to Azure—or any cloud that acts a bit differently than GCP or your laptop.
These are my hard-won notes and best practices, distilled from many hours of fighting with jumpboxes, DNS, and SSH from the Azure world. Add or link this as an appendix or notes section to your main blog!


🚀 Architecture Reminders

  • Jumpbox: All SSH/scp/file transfers route through this bastion host.

  • Control Plane (server): Runs kube-apiserver, etcd, controllers.

  • Nodes (node-0 & node-1): Run kubelet, kube-proxy, and host your pods.

  • VNet/Subnet: All nodes and control plane live together. Each has a private and (for management) a public IP.


🗝️ Keys & Jumpbox Operations

  • Every node and the control plane has a unique SSH PEM key (thanks, Azure!).

  • All keys are uploaded to the jumpbox; don’t try to scp directly from local to every node.

  • Always use absolute paths and set strict permissions:

      bashchmod 600 node0key.pem
      scp -i node0key.pem <file> azureuser@<NODE_0_PUBLIC_IP>:~
    

📦 cfssl Binaries: Downloading Hurdles

  • Downloading cfssl and cfssljson directly on Azure VMs is often unreliable (timeouts, SSL errors).

  • Workaround:

    • Download the binaries on your laptop.

    • Upload to jumpbox.

    • scp/push to each VM/control plane.

    • chmod +x everywhere.

  • Why this matters: Cloud sandboxing, firewall rules, or slow connections will eat your daylight. Local download is safer and more reliable!


📇 /etc/hosts: Manual Mapping for DNS

  • Private Hostnames: Manually map server.kubernetes.local, node-0, and node-1 to private IPs on every VM and your jumpbox.

      text10.0.0.6   server.kubernetes.local
      10.0.0.4   node-0
      10.0.0.7   node-1
    
  • Public IPs (Troubleshooting Workaround):
    If internal/private names still don’t let the control plane talk to nodes (for kubectl logs, exec, port-forward), add public IP mappings as well:

      text<NODE_0_PUBLIC_IP>   node-0
      <NODE_1_PUBLIC_IP>   node-1
      <SERVER_PUBLIC_IP>   server.kubernetes.local
    
  • Pro-tip: Remove or comment out public mappings after validating private network connectivity, especially for production or locked-down VNet setups.


🌐 Manual Pod Network Routing

  • Without a CNI plugin, set up every pod CIDR route manually:

      bash# On server:
      sudo ip route add 10.200.0.0/24 via 10.0.0.4
      sudo ip route add 10.200.1.0/24 via 10.0.0.7
      # On each node, add routes for all other pod subnets over the correct node’s private IP.
    
  • Double check pod communication with ping and logs!


🛠️ Common Azure-Specific Errors & Fixes

1. Controller Manager/Pod Creation Failure:

  • Error: Pods never create, no relevant error in kubectl.

  • Fix: Check controller-manager logs for “no such host: server.kubernetes.local”; update /etc/hosts on every server and restart kube-controller-manager.

2. Node DNS Resolution for Management Traffic:

  • Error:

    • kubectl logs <pod>no such host: node-1

    • Or: error dialing backend... no such host

  • Fix: /etc/hosts hostnames updated everywhere.

3. Port-Forward Port in Use:

  • Error:

    • Unable to listen on port 8080: bind: address already in use
  • Fix: Switch to another port, or use lsof -i :8080 and kill the conflicting process.

4. PEM Key or scp “Permission denied”:

  • Error:

    • Warning: Identity file ... not accessible ... Permission denied (publickey).
  • Fix: Upload PEMs to jumpbox, chmod 600, always use full path with scp/ssh.


🔒 Security & Blogging Best Practices

  • Redact all public IPs in any output or screenshot before posting. Use <NODE_0_PUBLIC_IP> or similar.

  • Describe every Azure-specific tweak for clarity.

  • Lock down Azure NSGs: Only jumpbox accessible from trusted IPs by SSH.


✨ Lessons and Final Thoughts

  • Cloud networking, DNS, and firewalls WILL break your cluster until you know how to bridge gaps with /etc/hosts and manual routes.

  • cfssl is incredible once set up, but be ready for download hacks.

  • Your jumpbox is your best friend and nerve center—script everything from there.

  • Document every fix—your future self (or your readers) will need it at 2am.


0
Subscribe to my newsletter

Read articles from Ashutosh Kandpal directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ashutosh Kandpal
Ashutosh Kandpal