Linux Namespaces and cgroups: Building Blocks of Modern Containerization

Maxat AkbanovMaxat Akbanov
6 min read

Linux namespaces and cgroups (control groups) are foundational Linux kernel features used to provide process isolation and resource management. They are core components of containerization technologies like Docker, Kubernetes, Podman and other orchestration platforms.

đź’ˇ
According to the official documentation of Control Group v2 - cgroup is never capitalized. The singular form is used to designate the whole feature and also as a qualifier as in “cgroup controllers”. When explicitly referring to multiple individual control groups, the plural form “cgroups” is used.

These features were first introduced into Linux kernel back in 2002. However, the real container support was added into Linux kernel in 2013.

Linux Namespaces

According to Wikipedia, Linux namespaces:

are a feature of the Linux kernel that partition kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set of resources.

By resources Linux kernel sees as process IDs, hostnames, user IDs, file names, some names associated with network access, and Inter-process communication.

Types of Namespaces

  1. mnt (Mount Namespace):

    • Isolates the file system mount points.

    • Processes in different mount namespaces can have different views of the file system hierarchy.

  2. pid (Process ID Namespace):

    • Isolates the process ID number space.

    • Processes in a PID namespace see a separate set of process IDs, starting at 1 for the init process of that namespace.

  3. net (Network Namespace):

    • Isolates network-related resources like network interfaces, routing tables, and IP addresses.

    • Each namespace can have its own network stack.

  4. ipc (Inter-Process Communication Namespace):

  5. uts (UNIX Timesharing System Namespace):

    • Isolates hostname and domain name.

    • Allows containers to have their own hostname.

    • As a result, it allows a single system to appear to have different host and domain names to different processes.

  6. user (User Namespace):

    • Isolates user and group IDs.

    • Processes in a user namespace can have a different set of user IDs, including root privileges, without affecting the host system. The first process created in a new namespace has PID 1 and child processes are assigned subsequent PIDs. If a child process is created with its own PID namespace, it has PID 1 in that namespace as well as its PID in the parent process’ namespace.

      Image credits: blog.nginx.org


Namespace Example

To create a namespace for isolating the process ID space:

unshare --user --pid --map-root-user --mount-proc --fork bash

This command creates a new namespace with its own user and PID namespaces.

  • --map-root-user - maps the root user to the new namespace. The user in new namespace will have root permissions

  • --mount-proc - mounts a new proc filesystem

  • --fork - ensures that a new process is started immediately in the newly created namespace(s), separating it from the parent process

  • bash - run the bash command as the new process in the newly created namespace

As you can see, the new namespace only sees it’s own processes:


Linux Control Groups (cgroups)

cgroups provide resource management by allowing you to allocate, prioritize, deny, or limit system resources (CPU, memory, disk I/O, etc.) for processes. The official definition is:

cgroup - is a mechanism to organize processes hierarchically and distribute system resources along the hierarchy in a controlled and configurable manner.

Key Features

  1. Resource Limiting:

    You can configure cgroup to set limits on CPU, memory, disk I/O, and network bandwidth a process can use.

  2. Prioritization:

    You can control how much of a resource (CPU, disk, or network) a process can use compared to processes in another cgroup when there is resource contention.

  3. Accounting:

    Track resource usage of processes or groups of processes at the cgroup level

  4. Control:

    Change the status (freeze, stopped, or restarted) of processes or terminate groups of processes.

Hierarchy and Controllers

cgroups is largely composed of two parts - the core and controllers. cgroup core is primarily responsible for hierarchically organizing processes. A cgroup controller is usually responsible for distributing a specific type of system resource along the hierarchy although there are controllers which can serve purposes other than resource distribution.

  • Limits CPU usage.

  • Limits memory usage and manages swapping behavior.

  • Controls block device I/O.

  • Tags network packets for Quality of Service (QoS).

  • Manages access to device nodes.

So basically you use cgroups to control how much of a given key resource (CPU, memory, network, and disk I/O) can be accessed or used by a process or set of processes. Cgroups are a key component of containers because there are often multiple processes running in a container that you need to control together. In a Kubernetes environment, cgroups can be used to implement resource requests and limits and corresponding QoS classes at the pod level.

The following diagram illustrates how when you allocate a particular percentage of available system resources to a cgroup (in this case cgroup‑1), the remaining percentage is available to other cgroups (and individual processes) on the system.

Image credits: blog.nginx.org


Practical example of using cgroup

Here’s an example of the script to create a cgroup, limit CPU usage to 50%, and validate the behavior of the restricted process.

#!/bin/bash

# Define variables
CGROUP_NAME="my_cgroup"
CGROUP_PATH="/sys/fs/cgroup/cpu/$CGROUP_NAME"
CPU_LIMIT=50000       # 50% CPU usage (quota in microseconds)
CPU_PERIOD=100000     # Period in microseconds (default is 100ms)

# Step 1: Create a new cgroup
echo "Creating cgroup at $CGROUP_PATH..."
mkdir -p $CGROUP_PATH

# Step 2: Set CPU usage limits
echo "Setting CPU limits..."
echo $CPU_LIMIT > $CGROUP_PATH/cpu.cfs_quota_us
echo $CPU_PERIOD > $CGROUP_PATH/cpu.cfs_period_us

# Step 3: Launch a process to test the CPU limit
echo "Starting a CPU-intensive process (infinite loop)..."
# Launch a background CPU-intensive process
bash -c "while :; do :; done" &
PROCESS_PID=$!

echo "Process started with PID $PROCESS_PID"

# Step 4: Add the process to the cgroup
echo "Adding process $PROCESS_PID to cgroup..."
echo $PROCESS_PID > $CGROUP_PATH/cgroup.procs

# Step 5: Monitor CPU usage for the process
echo "Monitoring CPU usage (press Ctrl+C to exit)..."
while true; do
    CPU_USAGE=$(ps -p $PROCESS_PID -o %cpu=)
    echo "CPU Usage of PID $PROCESS_PID: $CPU_USAGE%"
    sleep 2
done

Execute the script:

sudo ./cgroup.sh

Explanation of the Script

  1. Create a cgroup:

    • The script creates a directory in /sys/fs/cgroup/cpu/ for the new cgroup.
  2. Set CPU limits:

    • cpu.cfs_quota_us: Specifies the total time (in microseconds) the cgroup is allowed to run on the CPU within each cpu.cfs_period_us period.

    • cpu.cfs_period_us: Defines the time period (in microseconds) used to calculate CPU quotas. Default is 1000 microseconds.

  3. Run a CPU-intensive process:

    • An infinite loop (while :; do :; done) generates a CPU load for testing.
  4. Assign the process to the cgroup:

    • The script uses echo $PROCESS_PID > $CGROUP_PATH/cgroup.procs to attach the process to the cgroup.
  5. Monitor the CPU usage:

    • The ps command checks the process's CPU usage percentage in real time.

Observe the CPU Usage:

The script displays the CPU usage of the process every 2 seconds. The %CPU value should not exceed 50%, confirming the limitation.


Summary table for namespaces vs. cgroups

FeatureNamespacescgroups
PurposeIsolates system resources.Manages and limits resource usage.
ScopeProvides isolation.Provides control and accounting.
ExamplesProcess ID, filesystem, network.CPU, memory, disk I/O.
Use CaseContainer isolation.Resource allocation for containers.

References:

0
Subscribe to my newsletter

Read articles from Maxat Akbanov directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Maxat Akbanov
Maxat Akbanov

Hey, I'm a postgraduate in Cyber Security with practical experience in Software Engineering and DevOps Operations. The top player on TryHackMe platform, multilingual speaker (Kazakh, Russian, English, Spanish, and Turkish), curios person, bookworm, geek, sports lover, and just a good guy to speak with!