Don't Lose Your Data: Understanding Kubernetes Persistent Storage

Shrihari BhatShrihari Bhat
6 min read

Introduction

So far in our Kubernetes journey, we've focused on "stateless" applications like the Nginx web server. These apps are simple because they don't need to save data between restarts. If a Pod dies, a new one is created, and no state is lost.

But what about the applications that form the backbone of our systems? Think about a CI/CD server like Jenkins. The Jenkins controller stores all of its critical configuration—jobs, plugins, build history, credentials—on its local filesystem.

Now, imagine you deploy Jenkins in a Kubernetes Pod. Your team spends weeks configuring dozens of complex build pipelines. Then, for a routine node upgrade, the Jenkins Pod is rescheduled to a new node. When it starts up... disaster. All jobs, plugins, and history are gone. The new Pod started with a fresh, empty filesystem.

This is the fundamental challenge of running stateful applications on Kubernetes. By default, the filesystem inside a container is ephemeral (temporary). We need a way to ensure our critical data persists, independent of the Pod's lifecycle.

In this post, we will solve this problem by unraveling the mystery of Kubernetes storage. We'll explore the three key concepts that allow applications like Jenkins and databases to run reliably:

  • Volumes: The basic unit of storage attached to a Pod.

  • PersistentVolumes (PVs): The cluster's available storage resources.

  • PersistentVolumeClaims (PVCs): An application's request for storage.

The Problem: Ephemeral Pods Need Persistent Data

The Jenkins scenario highlights the core issue. The application's state (its /var/jenkins_home directory) is tightly coupled to the running container's filesystem. When the Pod is terminated, the data is lost forever.

To fix this, we need to decouple the storage lifecycle from the Pod lifecycle. We need to store the Jenkins home directory on a persistent disk that exists outside the Pod and can be re-attached to a new Jenkins Pod whenever it starts.

Volumes: Giving Pods a Place to Store Data

The most basic storage concept in Kubernetes is the Volume. A Volume is simply a directory, possibly with some data in it, which is made accessible to the containers in a Pod.

The key feature of a Volume is that its lifecycle is tied to the Pod, not the individual containers within it. If a container in the Pod restarts, the Volume's contents are preserved.

However, if the entire Pod is destroyed, most Volume types are also destroyed.

Kubernetes supports many types of Volumes, such as emptyDir for temporary scratch space or configMap for injecting configuration files. While these are useful, they don't solve our Jenkins problem. We need storage that lives completely independently of any single Pod. This brings us to the real solution: PersistentVolumes and PersistentVolumeClaims.

The PV/PVC Abstraction: Separating Concerns

To manage persistent storage effectively, Kubernetes uses a brilliant abstraction that separates the concerns of the Cluster Administrator (who provides the storage) from the Application Developer (who consumes the storage).

This separation is achieved through two objects:

  1. PersistentVolume (PV): A piece of storage in the cluster that has been provisioned by an administrator. It's a cluster resource, just like a Node is a cluster resource. PVs are abstractions for the underlying physical storage, like an AWS EBS volume, a GCP Persistent Disk, or an NFS share in your data center.

  2. PersistentVolumeClaim (PVC): A request for storage by a user or application. This is what our Jenkins deployment will use. It's like a Pod consuming Node resources; a PVC consumes PV resources. The user requests a certain size and access mode (e.g., "Jenkins needs 10 GiB of storage that can be mounted by one Pod at a time").

The Workflow in Practice

Let's walk through how this works to solve our Jenkins problem.

Step 1: The Administrator Provisions a PersistentVolume (PV)

The cluster admin creates a PV. For our local Kind/Minikube setup, we can use a hostPath PV, which simulates network storage by using a directory on our local machine.

Create a file named pv-definition.yaml:

# pv-definition.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: jenkins-pv-storage
spec:
  capacity:
    storage: 10Gi # Size of the volume
  accessModes:
    - ReadWriteOnce # Can be mounted as read-write by a single Node
  hostPath:
    path: "/mnt/data/jenkins" # On Minikube/Kind node, not your laptop!
  • capacity.storage: How much storage this PV provides.

  • accessModes: Defines how the volume can be mounted. For Jenkins, ReadWriteOnce (RWO) is appropriate, as typically only one Jenkins controller pod should be writing to its home directory at a time.

Step 2: The Developer Creates a PersistentVolumeClaim (PVC)

Now, the developer deploying Jenkins needs storage. They don't need to know about hostPath or AWS. They just ask for what they need for the Jenkins home directory.

Create a file named pvc-definition.yaml:

# pvc-definition.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: jenkins-pvc-claim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi # Request 8Gi of storage for Jenkins data

The developer requests 8Gi of storage with ReadWriteOnce access.

Step 3: Kubernetes Binds the PVC to a suitable PV

When the PVC is created, Kubernetes' control plane looks for a PV that can satisfy the claim. Our 10Gi PV can satisfy the 8Gi request, so Kubernetes will bind them together. The PV is now considered "in-use".

Step 4: The Developer Uses the PVC in the Jenkins Deployment

Finally, the developer creates the Jenkins Deployment and references the PVC by name, mounting it as a volume at the correct path (/var/jenkins_home).

Create jenkins-deployment.yaml:

# jenkins-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jenkins
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jenkins
  template:
    metadata:
      labels:
        app: jenkins
    spec:
      volumes:
        - name: jenkins-storage # A name for the volume within the Pod
          persistentVolumeClaim:
            claimName: jenkins-pvc-claim # Reference the PVC we created
      containers:
        - name: jenkins
          image: jenkins/jenkins:lts
          ports:
            - containerPort: 8080
          volumeMounts:
            - mountPath: "/var/jenkins_home" # Mount the volume inside the container
              name: jenkins-storage # Match the volume name from above
  • .spec.template.spec.volumes: We define a volume for the Pod named jenkins-storage. Its source is our PVC, jenkins-pvc-claim.

  • .spec.template.spec.containers.volumeMounts: We mount that volume into the Jenkins container at the path /var/jenkins_home.

Now, when the Jenkins Pod starts, all data written to its home directory is actually being saved to the persistent storage. If this Pod is deleted and the Deployment creates a new one, the new Pod will mount the exact same storage at /var/jenkins_home and all the jobs, plugins, and history will be right where we left them. Problem solved!

Conclusion

Managing state is one of the most critical aspects of running real-world applications on Kubernetes. As we saw with the Jenkins example, the ephemeral nature of Pods can be disastrous for stateful applications without a proper storage strategy. The PV/PVC abstraction is a powerful and elegant solution.

Let's recap the core idea:

  • PersistentVolume (PV): The "supply" of storage, managed by the cluster admin.

  • PersistentVolumeClaim (PVC): The "demand" for storage, requested by the application.

  • Binding: Kubernetes matches the demand (PVC) with an available supply (PV).

  • Deployments then use the PVC to give Pods access to persistent, long-term storage.

This separation of concerns allows developers to request storage in a standardized way, regardless of the underlying infrastructure, making applications portable and infrastructure management clean.

What's Next?

Manually creating PVs for every storage request can be tedious. What if we could have storage provisioned automatically whenever a developer creates a PVC? That's exactly what StorageClasses and Dynamic Provisioning are for, and it's the topic of our next post.

0
Subscribe to my newsletter

Read articles from Shrihari Bhat directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Shrihari Bhat
Shrihari Bhat