QDrant Multi-node Cluster Deployment on AWS EC2 with Helm Charts

Ummer FarooqUmmer Farooq
8 min read

Prerequisites

  • AWS Account with appropriate permissions

  • Basic knowledge of Kubernetes and Helm

  • SSH key pair for EC2 access

Phase 1: AWS Infrastructure Setup

Step 1: Create VPC and Networking

  1. Create VPC

    • Go to AWS Console → VPC → Create VPC

    • Name: qdrant-vpc

    • IPv4 CIDR: 10.0.0.0/16

    • Enable DNS hostnames and DNS resolution

  2. Create Subnets

    • Create 3 private subnets in different AZs:

      • qdrant-subnet-1a: 10.0.1.0/24 (ap-south-1a)

      • qdrant-subnet-1b: 10.0.2.0/24 (ap-south-1b)

      • qdrant-subnet-1c: 10.0.3.0/24 (ap-south-1c)

    • Create 1 public subnet for NAT Gateway:

      • qdrant-public-subnet: 10.0.100.0/24 (ap-south-1a)
  3. Create Internet Gateway

    • Name: qdrant-igw

    • Attach to qdrant-vpc

  4. Create NAT Gateway

    • Place in qdrant-public-subnet

    • Allocate Elastic IP

  5. Configure Route Tables

    • Public Route Table:

      • Route: 0.0.0.0/0 → Internet Gateway
    • Private Route Table:

      • Route: 0.0.0.0/0 → NAT Gateway

      • Associate with all private subnets

Step 2: Security Groups

  1. Create Security Group: qdrant-cluster-sg

    • VPC: qdrant-vpc

    • Inbound Rules:

      • SSH: Port 22 (Source: Your IP)

      • Kubernetes API: Port 6443 (Source: Security Group itself)

      • QDrant HTTP: Port 6333 (Source: Security Group itself)

      • QDrant gRPC: Port 6334 (Source: Security Group itself)

      • Etcd: Ports 2379-2380 (Source: Security Group itself)

      • Kubelet: Port 10250 (Source: Security Group itself)

      • NodePort Range: Ports 30000-32767 (Source: Security Group itself)

      • All Traffic: All ports (Source: Security Group itself)

    • Outbound Rules: All traffic to 0.0.0.0/0

Step 3: IAM Roles and Policies

  1. Create IAM Role: qdrant-node-role

    • Trusted entity: EC2

    • Attach policies:

      • AmazonEC2FullAccess

      • AmazonEBSCSIDriverPolicy

    • Create custom policy QDrantEBSPolicy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:AttachVolume",
                "ec2:DetachVolume",
                "ec2:DescribeVolumes",
                "ec2:DescribeInstances",
                "ec2:CreateVolume",
                "ec2:DeleteVolume",
                "ec2:CreateSnapshot",
                "ec2:DeleteSnapshot",
                "ec2:DescribeSnapshots",
                "ec2:CreateTags"
            ],
            "Resource": "*"
        }
    ]
}
  1. Create Instance Profile

    • Name: qdrant-instance-profile

    • Add role: qdrant-node-role

Phase 2: EC2 Instances Setup

Step 4: Launch EC2 Instances

Launch 3 EC2 instances with the following specifications:

Instance Configuration:

  • AMI: Ubuntu 22.04 LTS

  • Instance Type: t3.medium (minimum) or t3.large (recommended)

  • Key Pair: Your SSH key

  • VPC: qdrant-vpc

  • Subnets: Place each instance in different subnets

  • Security Group: qdrant-cluster-sg

  • IAM Role: qdrant-instance-profile

  • Storage: 20GB gp3 root volume + 50GB gp3 data volume for each instance

Instance Names:

  • qdrant-master-1 (in qdrant-subnet-1a)

  • qdrant-worker-1 (in qdrant-subnet-1b)

  • qdrant-worker-2 (in qdrant-subnet-1c)

Step 5: Create Additional EBS Volumes

For each instance, create additional EBS volumes for persistent storage:

  1. Go to EC2 → Volumes → Create Volume

  2. Create 3 volumes (one per instance):

    • Volume Type: gp3

    • Size: 50GB each

    • Availability Zone: Match instance AZ

    • Tags: Name = qdrant-data-volume-{1,2,3}

  3. Attach each volume to corresponding instance

Phase 3: Kubernetes Cluster Setup

Step 6: Install Prerequisites on All Nodes

SSH into each instance and run:

#!/bin/bash
# Update system
sudo apt update && sudo apt upgrade -y

# Install Docker
sudo apt install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io

# Configure Docker
sudo usermod -aG docker $USER
sudo systemctl enable docker
sudo systemctl start docker

# Install kubeadm, kubelet, kubectl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# Configure containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd

# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Load kernel modules
sudo modprobe br_netfilter
echo 'br_netfilter' | sudo tee /etc/modules-load.d/k8s.conf

# Configure sysctl
sudo tee /etc/sysctl.d/k8s.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system

Step 7: Initialize Master Node

On the master node (qdrant-master-1):

# Initialize cluster
sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=<MASTER_PRIVATE_IP>

# Configure kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Install Calico CNI
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml

# Generate join command (save this output)
kubeadm token create --print-join-command

Step 8: Join Worker Nodes

On both worker nodes, run the join command from previous step:

sudo kubeadm join <MASTER_IP>:6443 --token <TOKEN> --discovery-token-ca-cert-hash <HASH>

Step 9: Verify Cluster

On master node:

kubectl get nodes
kubectl get pods -A

Phase 4: Storage Setup

Step 10: Install EBS CSI Driver

# Install EBS CSI Driver
kubectl apply -k "github.com/kubernetes-sigs/aws-ebs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.23"

# Verify installation
kubectl get pods -n kube-system | grep ebs-csi

Step 11: Create Storage Class

Create ebs-storageclass.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

Apply the storage class:

kubectl apply -f ebs-storageclass.yaml

Phase 5: Helm and QDrant Deployment

Step 12: Install Helm

On master node:

curl https://baltocdn.com/helm/signing.asc | gpg --dearmor | sudo tee /usr/share/keyrings/helm.gpg > /dev/null
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/helm.gpg] https://baltocdn.com/helm/stable/debian/ all main" | sudo tee /etc/apt/sources.list.d/helm-stable-debian.list
sudo apt update
sudo apt install helm

Step 13: Add QDrant Helm Repository

helm repo add qdrant https://qdrant.github.io/qdrant-helm
helm repo update

Step 14: Create QDrant Values File

Create qdrant-values.yaml:

# QDrant Cluster Configuration
replicaCount: 3

image:
  repository: qdrant/qdrant
  tag: "v1.7.4"
  pullPolicy: IfNotPresent

# Service configuration
service:
  type: NodePort
  httpPort: 6333
  grpcPort: 6334
  httpNodePort: 30333
  grpcNodePort: 30334

# Persistent storage
persistence:
  enabled: true
  storageClass: "ebs-gp3"
  size: 50Gi
  accessMode: ReadWriteOnce

# Resource limits
resources:
  limits:
    cpu: 1000m
    memory: 2Gi
  requests:
    cpu: 500m
    memory: 1Gi

# Pod disruption budget
podDisruptionBudget:
  enabled: true
  minAvailable: 2

# Anti-affinity to spread pods across nodes
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - qdrant
        topologyKey: kubernetes.io/hostname

# QDrant specific configuration
config:
  cluster:
    enabled: true
    p2p:
      port: 6335
  service:
    api_key: "your_secret_master_api_key_here"
    read_only_api_key: "your_secret_read_only_api_key_here"
    http_port: 6333
    grpc_port: 6334
  storage:
    storage_path: "/qdrant/storage"
    snapshots_path: "/qdrant/snapshots"
    on_disk_payload: true
  log_level: "INFO"

# Environment variables for clustering
env:
  - name: QDRANT__CLUSTER__ENABLED
    value: "true"
  - name: QDRANT__CLUSTER__P2P__PORT
    value: "6335"

# Security context
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

# Node selector to ensure pods are scheduled on our nodes
nodeSelector: {}

# Tolerations
tolerations: []

Step 15: Deploy QDrant Cluster

# Create namespace
kubectl create namespace qdrant

# Deploy QDrant
helm install qdrant qdrant/qdrant \
  --namespace qdrant \
  --values qdrant-values.yaml \
  --wait

# Verify deployment
kubectl get pods -n qdrant
kubectl get pvc -n qdrant
kubectl get svc -n qdrant

Step 16: Create Load Balancer Service (Optional)

For external access, create qdrant-lb.yaml:

apiVersion: v1
kind: Service
metadata:
  name: qdrant-loadbalancer
  namespace: qdrant
spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: qdrant
  ports:
    - name: http
      port: 6333
      targetPort: 6333
    - name: grpc
      port: 6334
      targetPort: 6334

Apply the load balancer:

kubectl apply -f qdrant-lb.yaml

Phase 6: Verification and Testing

Step 17: Verify Cluster Status

# Check pods
kubectl get pods -n qdrant -o wide

# Check persistent volumes
kubectl get pv
kubectl get pvc -n qdrant

# Check services
kubectl get svc -n qdrant

# Check logs
kubectl logs -n qdrant -l app.kubernetes.io/name=qdrant

# Port forward for testing (run in background)
kubectl port-forward -n qdrant svc/qdrant 6333:6333 &

Step 18: Test QDrant API

# Test cluster info
curl -X GET "http://localhost:6333/cluster"

# Test collections
curl -X GET "http://localhost:6333/collections"

# Create a test collection
curl -X PUT "http://localhost:6333/collections/test_collection" \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": {
      "size": 100,
      "distance": "Cosine"
    }
  }'

Phase 7: Monitoring and Maintenance

Step 19: Set Up Basic Monitoring

Create monitoring-values.yaml for Prometheus (optional):

prometheus:
  enabled: true
  serviceMonitor:
    enabled: true
    namespace: qdrant

Step 20: Backup Strategy

Create backup script backup-qdrant.sh:

#!/bin/bash
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backup/qdrant_$TIMESTAMP"

# Create snapshots via API
for pod in $(kubectl get pods -n qdrant -l app.kubernetes.io/name=qdrant -o jsonpath='{.items[*].metadata.name}'); do
  kubectl exec -n qdrant $pod -- curl -X POST "http://localhost:6333/snapshots"
done

# Copy snapshots from persistent volumes
kubectl exec -n qdrant qdrant-0 -- tar -czf /tmp/qdrant-backup-$TIMESTAMP.tar.gz /qdrant/snapshots
kubectl cp qdrant/qdrant-0:/tmp/qdrant-backup-$TIMESTAMP.tar.gz ./qdrant-backup-$TIMESTAMP.tar.gz

Troubleshooting

Common Issues and Solutions

  1. Pods stuck in Pending state:

    • Check node resources: kubectl describe nodes

    • Check PVC status: kubectl get pvc -n qdrant

    • Verify EBS CSI driver: kubectl get pods -n kube-system | grep ebs-csi

  2. Storage issues:

    • Verify IAM permissions for EBS operations

    • Check storage class: kubectl get storageclass

    • Review EBS volume attachments in AWS Console

  3. Network connectivity issues:

    • Verify security group rules

    • Check Calico pod status: kubectl get pods -n kube-system | grep calico

    • Test pod-to-pod connectivity

  4. QDrant cluster formation issues:

    • Check cluster configuration in pod logs

    • Verify p2p port accessibility between pods

    • Review QDrant cluster API endpoint

Maintenance Commands

# Scale cluster
helm upgrade qdrant qdrant/qdrant --namespace qdrant --set replicaCount=5 --values qdrant-values.yaml

# Update QDrant version
helm upgrade qdrant qdrant/qdrant --namespace qdrant --set image.tag=v1.8.0 --values qdrant-values.yaml

# Backup and restore procedures
kubectl exec -n qdrant qdrant-0 -- /qdrant/backup.sh

Security Considerations

  1. Network Security:

    • Use private subnets for all worker nodes

    • Restrict security group access to minimum required ports

    • Consider using AWS PrivateLink for internal communication

  2. Storage Security:

    • Enable EBS encryption

    • Use IAM roles with least privilege

    • Regular backup testing and restoration procedures

  3. Access Control:

    • Implement RBAC in Kubernetes

    • Use network policies to restrict pod communication

    • Enable audit logging

This deployment provides a production-ready QDrant cluster with high availability, persistent storage, and proper AWS integration.

0
Subscribe to my newsletter

Read articles from Ummer Farooq directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ummer Farooq
Ummer Farooq