🛡️ 1. Introduction to SecurityContext in Kubernetes

A SecurityContext in Kubernetes defines privilege and access control settings for pods or containers, allowing you to control how processes run, access resources, and interact with the system. It is a critical component for securing Kubernetes workloads by enforcing least-privilege principles.

Pod-Level SecurityContext 🗂️: Applies security settings to all containers in a pod and can affect the pod’s volumes. It’s defined under spec.securityContext.
Container-Level SecurityContext 🖥️: Applies to a specific container and can override pod-level settings for that container. It’s defined under spec.containers[].securityContext.

The key difference is scope:

Pod-level settings provide a baseline for all containers and volumes in the pod.
Container-level settings allow fine-grained customization for individual containers, overriding pod-level settings where applicable.

📂 2. Pod-Level SecurityContext

The pod-level securityContext is defined in the pod’s spec and applies to all containers in the pod unless overridden by a container-level securityContext. It also applies to certain volume-related settings (e.g., fsGroup and seLinuxOptions).

🛠️ Fields in Pod-Level SecurityContext

Here’s a comprehensive list of fields available at the pod level, their purpose, and examples:

runAsUser 👤:
- Purpose: Specifies the user ID (UID) for all containers’ processes in the pod.
- Use Case: Ensures containers don’t run as root, reducing the risk of privilege escalation.
- Example: A web server pod where all containers should run as a non-root user for security.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: web-server-pod
  spec:
    securityContext:
      runAsUser: 1000  # All containers run as UID 1000
    containers:
    - name: nginx
      image: nginx
      ports:
      - containerPort: 80
```
  In this example, a web server (e.g., Nginx) runs as UID 1000, preventing root-level access even if the container is compromised.

runAsGroup 👥:

Purpose: Sets the primary group ID (GID) for all containers’ processes.
Use Case: Controls group ownership for files created by containers, useful for shared volumes.

Example: A pod with a shared volume where files need consistent group ownership.

  apiVersion: v1
  kind: Pod
  metadata:
    name: shared-volume-pod
  spec:
    securityContext:
      runAsUser: 1000
      runAsGroup: 3000  # Primary group ID for processes
    volumes:
    - name: shared-data
      emptyDir: {}
    containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "echo hello > /data/testfile && sleep 1h"]
      volumeMounts:
      - name: shared-data
        mountPath: /data

Files created in the /data volume will be owned by GID 3000, ensuring consistent group access.

runAsNonRoot 🔒:
- Purpose: Ensures all containers run as a non-root user (UID ≠ 0). If set to true, Kubernetes rejects the pod if any container tries to run as root.
- Use Case: Enforce a policy where no container in the pod can run as root.
- Example: A corporate policy requires all pods to run non-root for compliance.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: non-root-pod
  spec:
    securityContext:
      runAsNonRoot: true  # Enforces non-root user
    containers:
    - name: app
      image: nginx
      ports:
      - containerPort: 80
```
  If the container tries to run as root, the pod will fail to start.

fsGroup 💾:

Purpose: Sets the group ID for volume ownership and permissions. Kubernetes applies this GID to volumes that support ownership management (e.g., emptyDir, persistentVolumeClaim).
Use Case: Ensures files in a shared volume are accessible by a specific group, such as in a multi-container pod.

Example: A pod with a shared volume for a data processing application.

  apiVersion: v1
  kind: Pod
  metadata:
    name: data-processing-pod
  spec:
    securityContext:
      runAsUser: 1000
      fsGroup: 2000  # Volume files owned by GID 2000
    volumes:
    - name: data-vol
      emptyDir: {}
    containers:
    - name: processor
      image: busybox
      command: ["sh", "-c", "echo data > /data/output && sleep 1h"]
      volumeMounts:
      - name: data-vol
        mountPath: /data

Files in /data will be owned by GID 2000, ensuring group-level access control.

supplementalGroups 👨‍👩‍👧‍👦:
- Purpose: Adds additional group IDs to container processes, beyond the primary runAsGroup.
- Use Case: Grants access to resources owned by multiple groups, such as shared storage.
- Example: A pod accessing multiple shared volumes with different group ownerships.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: multi-group-pod
  spec:
    securityContext:
      runAsUser: 1000
      runAsGroup: 3000
      supplementalGroups: [4000, 5000]  # Additional group memberships
    containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
```
  Processes in the container belong to GIDs 3000, 4000, and 5000, allowing access to resources owned by these groups.
supplementalGroupsPolicy (Kubernetes v1.33+, beta) 🛠️:
- Purpose: Controls how supplementary groups are calculated. Options are:
  - Merge: Merges groups from the container image’s /etc/group with fsGroup and supplementalGroups.
  - Strict: Only uses groups specified in fsGroup, supplementalGroups, or runAsGroup, ignoring /etc/group.
- Use Case: Avoid unintended group memberships from the container image for stricter security.
- Example: A pod requiring strict group control for compliance.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: strict-groups-pod
  spec:
    securityContext:
      runAsUser: 1000
      runAsGroup: 3000
      supplementalGroups: [4000]
      supplementalGroupsPolicy: Strict  # Only specified groups are used
    containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
```
  The container process will only have GIDs 3000 and 4000, ignoring any groups defined in the image’s /etc/group.

fsGroupChangePolicy ⚙️:

Purpose: Controls how Kubernetes changes ownership and permissions for volumes. Options are:
- OnRootMismatch: Only changes permissions if the volume’s root directory doesn’t match the expected fsGroup.
- Always: Always changes permissions when the volume is mounted.
Use Case: Optimize pod startup time for large volumes by reducing unnecessary permission changes.

Example: A pod with a large persistent volume.

  apiVersion: v1
  kind: Pod
  metadata:
    name: large-volume-pod
  spec:
    securityContext:
      runAsUser: 1000
      fsGroup: 2000
      fsGroupChangePolicy: OnRootMismatch  # Optimize permission changes
    volumes:
    - name: data
      persistentVolumeClaim:
        claimName: data-pvc
    containers:
    - name: app
      image: busybox
      volumeMounts:
      - name: data
        mountPath: /data

This reduces startup time by only changing permissions when necessary.

seLinuxOptions 🔐:
- Purpose: Assigns SELinux labels to containers and volumes for access control.
- Use Case: Enforce mandatory access control in environments with SELinux enabled (e.g., Red Hat systems).
- Example: A pod running in an SELinux-enabled cluster.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: selinux-pod
  spec:
    securityContext:
      seLinuxOptions:
        level: "s0:c123,c456"  # SELinux label for processes and volumes
    containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
```
  All containers and volumes use the specified SELinux label, ensuring compliance with SELinux policies.
seLinuxChangePolicy (Kubernetes v1.33+, beta) 🔄:
- Purpose: Controls SELinux relabelling behaviour. Options are:
  - MountOption: Uses mount options for faster relabelling (requires SELinuxMount feature gate).
  - Recursive: Recursively relabels all files in the volume.
- Use Case: Optimize SELinux relabelling for performance or allow multiple pods with different labels to share a volume.
- Example: A pod opting out of mount-based relabelling for compatibility.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: selinux-recursive-pod
  spec:
    securityContext:
      seLinuxOptions:
        level: "s0:c123,c456"
      seLinuxChangePolicy: Recursive  # Recursive relabeling
    containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
```
  This ensures recursive relabeling, allowing multiple pods with different SELinux labels to share a volume.
procMount (Kubernetes v1.33+, beta) 🖥️:
- Purpose: Controls the /proc filesystem’s mount behaviour. Options are:
  - Default: Masks certain /proc paths (e.g., /proc/kcore) and makes others read-only.
  - Unmasked: Exposes all /proc paths, useful for nested container runtimes.
- Use Case: Running containers within containers (e.g., Docker-in-Docker).
- Example: A pod running a CI/CD pipeline with nested containers.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: dind-pod
  spec:
    securityContext:
      procMount: Unmasked  # Expose full /proc
    hostUsers: false  # Required for Unmasked
    containers:
    - name: docker
      image: docker:dind
      command: ["dockerd"]
```
  This allows the Docker daemon to access the full /proc filesystem for container management.

🌐 Real-Life Example for Pod-Level SecurityContext

Scenario: A company runs a microservices application with multiple pods, each containing multiple containers (e.g., an app and a logging sidecar). To comply with security policies, all containers must run as non-root, and shared volumes must be accessible by a specific group.

apiVersion: v1
kind: Pod
metadata:
  name: microservice-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    runAsNonRoot: true
  volumes:
  - name: logs
    emptyDir: {}
  containers:
  - name: app
    image: my-app:1.0
    volumeMounts:
    - name: logs
      mountPath: /logs
  - name: log-collector
    image: fluentd
    volumeMounts:
    - name: logs
      mountPath: /logs

Explanation:

All containers run as UID 1000 and GID 3000.
The logs volume is owned by GID 2000 (fsGroup), ensuring both containers can write to it.
runAsNonRoot: true enforces non-root execution, aligning with compliance requirements.

🖥️ 3. Container-Level SecurityContext

The container-level securityContext is defined under spec.containers[].securityContext and applies only to the specific container. It can override pod-level settings for that container but doesn’t affect volumes.

🛠️ Fields in Container-Level SecurityContext

Here’s a comprehensive list of fields available at the container level:

runAsUser 👤:

Purpose: Overrides the pod-level runAsUser for the specific container.
Use Case: A specific container needs to run as a different user (e.g., root for administrative tasks).

Example: A pod with a sidecar requiring root privileges.

  apiVersion: v1
  kind: Pod
  metadata:
    name: mixed-user-pod
  spec:
    securityContext:
      runAsUser: 1000
    containers:
    - name: app
      image: nginx
      ports:
      - containerPort: 80
    - name: admin-tool
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
      securityContext:
        runAsUser: 0  # Runs as root, overriding pod-level setting

runAsGroup 👥:

Purpose: Overrides the pod-level runAsGroup for the container’s primary group ID.
Use Case: A container needs a different primary group for specific access requirements.

Example: A container accessing a volume with a unique group.

  apiVersion: v1
  kind: Pod
  metadata:
    name: custom-group-pod
  spec:
    securityContext:
      runAsGroup: 3000
    containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
      securityContext:
        runAsGroup: 4000  # Overrides pod-level runAsGroup

runAsNonRoot 🔒:

Purpose: Enforces non-root execution for the specific container, overriding pod-level settings.
Use Case: Ensure a specific container adheres to non-root policies, even if the pod allows root.

Example: A sidecar container must run non-root for security.

  apiVersion: v1
  kind: Pod
  metadata:
    name: non-root-sidecar-pod
  spec:
    containers:
    - name: app
      image: nginx
    - name: sidecar
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
      securityContext:
        runAsNonRoot: true

capabilities ⚙️:

Purpose: Adds or drops Linux capabilities for the container.
Use Case: Grant specific privileges (e.g., NET_ADMIN) without full root access.

Example: A container needs to manage network interfaces.

  apiVersion: v1
  kind: Pod
  metadata:
    name: network-admin-pod
  spec:
    containers:
    - name: network-tool
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
      securityContext:
        capabilities:
          add: ["NET_ADMIN"]  # Grants network administration privileges
          drop: ["ALL"]  # Drops all other capabilities

privileged 🔓:

Purpose: Runs the container in privileged mode, granting full root privileges, similar to Docker’s --privileged flag.
Use Case: Rare cases where a container needs unrestricted access (e.g., running a system utility).

Example: A container running a system diagnostic tool.

  apiVersion: v1
  kind: Pod
  metadata:
    name: privileged-pod
  spec:
    containers:
    - name: diagnostic-tool
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
      securityContext:
        privileged: true  # Full root privileges

allowPrivilegeEscalation 🚫:

Purpose: Controls whether a process can gain more privileges than its parent (e.g., via setuid binaries). Set to false to prevent escalation.
Use Case: Prevent containers from escalating privileges in sensitive environments.

Example: A container running untrusted code.

  apiVersion: v1
  kind: Pod
  metadata:
    name: no-escalation-pod
  spec:
    containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 1h"]
      securityContext:
        allowPrivilegeEscalation: false  # Prevents privilege escalation

readOnlyRootFilesystem 📖:

Purpose: Mounts the container’s root filesystem as read-only, preventing modifications.
Use Case: Enhance security by ensuring the container cannot alter its filesystem.

Example: A stateless application container.

  apiVersion: v1
  kind: Pod
  metadata:
    name: readonly-pod
  spec:
    containers:
    - name: app
      image: nginx
      securityContext:
        readOnlyRootFilesystem: true  # Root filesystem is read-only

seccompProfile 🔐:
- Purpose: Specifies a Seccomp profile to filter system calls, enhancing security.
- Options:
  - RuntimeDefault: Uses the container runtime’s default profile.
  - Unconfined: No Seccomp filtering.
  - Localhost: Uses a custom profile from the node.
- Use Case: Restrict dangerous system calls in a container.
- Example: A container with a default Seccomp profile.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: seccomp-pod
  spec:
    containers:
    - name: app
      image: busybox
      securityContext:
        seccompProfile:
          type: RuntimeDefault  # Apply default Seccomp profile
```

appArmorProfile 🛡️:

Purpose: Applies an AppArmor profile to restrict the container’s capabilities.
Options: RuntimeDefault, Unconfined, or Localhost with a profile name.
Use Case: Restrict a container’s access in an AppArmor-enabled environment.

Example: A container with a custom AppArmor profile.

  apiVersion: v1
  kind: Pod
  metadata:
    name: apparmor-pod
  spec:
    containers:
    - name: app
      image: busybox
      securityContext:
        appArmorProfile:
          type: Localhost
          localhostProfile: k8s-apparmor-example-deny-write

seLinuxOptions 🔐:

Purpose: Overrides pod-level SELinux labels for the container.
Use Case: Apply a specific SELinux label to a container in an SELinux-enabled cluster.

Example: A container requiring a unique SELinux label.

  apiVersion: v1
  kind: Pod
  metadata:
    name: selinux-container-pod
  spec:
    containers:
    - name: app
      image: busybox
      securityContext:
        seLinuxOptions:
          level: "s0:c789,c012"

procMount 🖥️:

Purpose: Overrides pod-level procMount settings for the container.
Use Case: A specific container needs an unmasked /proc for nested container runtimes.

Example: A container running a nested Kubernetes cluster.

  apiVersion: v1
  kind: Pod
  metadata:
    name: nested-k8s-pod
  spec:
    containers:
    - name: k8s
      image: kindest/node
      securityContext:
        procMount: Unmasked  # Full /proc access

🌐 Real-Life Example for Container-Level SecurityContext

Scenario: A pod runs a web application (Nginx) and a monitoring tool requiring specific privileges (e.g., NET_ADMIN for network diagnostics).

apiVersion: v1
kind: Pod
metadata:
  name: web-monitor-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsNonRoot: true
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
  - name: monitor
    image: busybox
    command: ["sh", "-c", "sleep 1h"]
    securityContext:
      runAsUser: 2000  # Override pod-level runAsUser
      capabilities:
        add: ["NET_ADMIN"]  # Grant network privileges
      allowPrivilegeEscalation: false  # Prevent escalation

Explanation:

The pod-level runAsUser: 1000 applies to the Nginx container.
The monitor container overrides this with runAsUser: 2000 and adds NET_ADMIN for diagnostics.
allowPrivilegeEscalation: false ensures the monitor cannot gain additional privileges.

🔓 4. Privileged Mode

Privileged mode (privileged: true) grants a container full root privileges, equivalent to Docker’s --privileged flag. It bypasses most security restrictions, giving the container access to the host’s resources.

❓ When to Use Privileged Mode

Use Case: Rare scenarios requiring unrestricted access, such as:
- Running system utilities (e.g., kernel debugging tools).
- Nested container runtimes (e.g., Docker-in-Docker).
- Hardware access (e.g., GPU drivers).
Risks: Highly insecure, as it allows the container to affect the host system. Avoid unless absolutely necessary.

🌐 Example of Privileged Mode

Scenario: A pod running a Docker-in-Docker (DinD) setup for a CI/CD pipeline.

apiVersion: v1
kind: Pod
metadata:
  name: dind-pod
spec:
  containers:
  - name: docker
    image: docker:dind
    securityContext:
      privileged: true  # Full root privileges
    command: ["dockerd"]

Explanation:

The docker:dind image requires privileged mode to run the Docker daemon, which needs access to the host’s kernel and devices.
This setup is common in CI/CD pipelines (e.g., Jenkins) but should be tightly controlled due to security risks.

⚖️ 5. Pod-Level vs. Container-Level SecurityContext: Differences

Aspect	Pod-Level SecurityContext 🗂️	Container-Level SecurityContext 🖥️
Scope	Applies to all containers in the pod and volumes.	Applies only to the specific container.
Fields Available	Includes `fsGroup`, `supplementalGroups`, `seLinuxOptions`, `fsGroupChangePolicy`, `supplementalGroupsPolicy`, `procMount`.	Includes `capabilities`, `privileged`, `readOnlyRootFilesystem`, `seccompProfile`, `appArmorProfile`, and overrides for `runAsUser`, `runAsGroup`, `runAsNonRoot`, `seLinuxOptions`, `procMount`.
Volume Impact	Affects volume ownership and permissions (`fsGroup`, `seLinuxOptions`).	Does not affect volumes.
Override Behavior	Provides default settings for all containers.	Overrides pod-level settings for the container.
Use Case	Set baseline security for all containers and volumes (e.g., shared volume permissions).	Customize security for a specific container (e.g., add capabilities or run as root).

Example of Pod vs. Container-Level Interaction:

apiVersion: v1
kind: Pod
metadata:
  name: mixed-security-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
  - name: app
    image: nginx
  - name: privileged-tool
    image: busybox
    securityContext:
      runAsUser: 0  # Override to run as root
      privileged: true  # Full privileges
      capabilities:
        add: ["SYS_ADMIN"]

Explanation:

The app container uses the pod-level settings (runAsUser: 1000, runAsGroup: 3000).
The privileged-tool container overrides these with runAsUser: 0 and runs in privileged mode with additional capabilities.
The fsGroup: 2000 applies to any shared volumes, unaffected by container-level settings.

🤔 6. When to Use Pod-Level vs. Container-Level SecurityContext

Use Pod-Level SecurityContext 🗂️:
- When all containers in the pod share common security settings (e.g., non-root execution, volume ownership).
- For volume-related settings (fsGroup, seLinuxOptions) that apply across containers.
- Example: A pod with multiple containers sharing a volume, requiring consistent user and group settings.
Use Container-Level SecurityContext 🖥️:
- When a specific container needs different settings (e.g., one container needs NET_ADMIN or root privileges).
- For container-specific restrictions like readOnlyRootFilesystem or seccompProfile.
- Example: A pod where one container runs a privileged task while others are restricted.

✅ 7. Best Practices and Real-Life Considerations

Minimize Privileges 🔒:
- Avoid privileged: true unless absolutely necessary.
- Use runAsNonRoot: true and drop unnecessary capabilities.
Use Read-Only Filesystems 📖:
- Set readOnlyRootFilesystem: true for containers that don’t need to write to their filesystem.
Optimize Volume Permissions 💾:
- Use fsGroupChangePolicy: OnRootMismatch for large volumes to reduce startup time.
- Use supplementalGroupsPolicy: Strict to avoid unintended group memberships.
Leverage Seccomp and AppArmor 🛡️:
- Apply seccompProfile: RuntimeDefault and AppArmor profiles for additional security layers.
SELinux in Secure Environments 🔐:
- Use seLinuxOptions and seLinuxChangePolicy: Recursive in SELinux-enabled clusters for fine-grained control.
Monitor and Audit 📊:
- Use tools like kubectl describe pod and metrics (e.g., selinux_warning_controller_selinux_volume_conflict) to detect misconfigurations.

🎯 8. Conclusion

Pod-level SecurityContext 🗂️ is ideal for setting baseline security policies and managing volume permissions across all containers in a pod. Container-level SecurityContext 🖥️ allows fine-grained customization for individual containers, overriding pod-level settings when needed. Privileged mode 🔓 should be used sparingly due to its security risks.

Mastering Kubernetes Security: A Deep Dive into SecurityContext

Table of contents

🛡️ 1. Introduction to SecurityContext in Kubernetes

📂 2. Pod-Level SecurityContext

🛠️ Fields in Pod-Level SecurityContext

🌐 Real-Life Example for Pod-Level SecurityContext

🖥️ 3. Container-Level SecurityContext

🛠️ Fields in Container-Level SecurityContext

🌐 Real-Life Example for Container-Level SecurityContext

🔓 4. Privileged Mode

❓ When to Use Privileged Mode

🌐 Example of Privileged Mode

⚖️ 5. Pod-Level vs. Container-Level SecurityContext: Differences

🤔 6. When to Use Pod-Level vs. Container-Level SecurityContext

✅ 7. Best Practices and Real-Life Considerations

🎯 8. Conclusion

Subscribe to my newsletter

Omkar Shelke

Omkar Shelke