Static Provisioned Environments for Specialized Workloads: GPU and CPU-Intensive Tasks

Introduction
Modern computational workloads often require specialized resources, particularly for machine learning, scientific computing, and data processing tasks. While Kubernetes offers solutions for GPU and CPU-intensive workloads, its overhead can be significant. This article demonstrates how to create isolated environments specifically optimized for GPU-accelerated applications and CPU-intensive tasks using Linux's native isolation capabilities.
System Architecture Overview
Our approach creates two distinct isolated environments:
GPU Partition: For machine learning, rendering, or other GPU-accelerated workloads
CPU-Intensive Partition: For multi-threaded computational tasks that benefit from dedicated CPU resources
Each environment will have:
Resource isolation via namespaces and cgroups
Optimized libraries and tooling for their specific workload type
Direct hardware access where required
Performance monitoring capabilities
Base System Preparation
Start with a comprehensive initialization script:
#!/bin/bash
# /opt/specialized-environments/init.sh
# Load kernel modules required for GPU and CPU isolation
modprobe nvidia
modprobe nvidia_uvm
modprobe cpuset
# Enable system settings for optimal performance
echo 1 > /proc/sys/kernel/numa_balancing
echo 1 > /proc/sys/vm/zone_reclaim_mode
echo 1 > /proc/sys/net/ipv4/ip_forward
# Create base directories
mkdir -p /var/lib/environments/{gpu-compute,cpu-compute}
mkdir -p /var/lib/environment-data/{gpu-compute,cpu-compute}
mkdir -p /var/run/environments
# Create network bridge for isolated environments
ip link add name compute-bridge type bridge
ip addr add 10.200.0.1/24 dev compute-bridge
ip link set compute-bridge up
# Setup iptables for outbound connectivity
iptables -t nat -A POSTROUTING -s 10.200.0.0/24 -j MASQUERADE
# Set global CPU performance governor for maximum performance
for governor in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
echo performance > $governor
done
# Load environment configurations and run setup
source /etc/specialized-environments/gpu-compute.conf
source /etc/specialized-environments/cpu-compute.conf
# Initialize environments
setup_gpu_environment
setup_cpu_environment
# Start monitoring
systemctl start environment-monitor.service
GPU-Accelerated Environment Setup
The following script creates an isolated environment optimized for GPU workloads:
#!/bin/bash
# Part of /etc/specialized-environments/gpu-compute.conf
setup_gpu_environment() {
local ENV_NAME="gpu-compute"
local ENV_ROOT="/var/lib/environments/${ENV_NAME}"
local DATA_ROOT="/var/lib/environment-data/${ENV_NAME}"
echo "Setting up ${ENV_NAME} environment..."
# Create network namespace
ip netns add ${ENV_NAME}
# Create veth pair for networking
ip link add veth-${ENV_NAME} type veth peer name veth0
ip link set veth-${ENV_NAME} up
ip link set veth0 netns ${ENV_NAME}
# Configure networking
ip addr add 10.200.0.10/24 dev veth-${ENV_NAME}
ip netns exec ${ENV_NAME} ip addr add 10.200.0.11/24 dev veth0
ip netns exec ${ENV_NAME} ip link set veth0 up
ip netns exec ${ENV_NAME} ip link set lo up
ip netns exec ${ENV_NAME} ip route add default via 10.200.0.10
# Prepare filesystem structure for GPU environment
if [ ! -d "${ENV_ROOT}/usr/local/cuda" ]; then
# Create minimal filesystem with CUDA support
mkdir -p ${ENV_ROOT}/{bin,sbin,lib,lib64,usr,etc,var,tmp,proc,sys,dev,run,opt}
mkdir -p ${ENV_ROOT}/usr/{bin,sbin,lib,lib64,local,share}
mkdir -p ${ENV_ROOT}/usr/local/{cuda,bin,lib64}
mkdir -p ${ENV_ROOT}/var/{log,tmp}
mkdir -p ${ENV_ROOT}/opt/ml/{model,input,output}
# Copy essential binaries
cp /bin/{bash,ls,mkdir,cp,rm,mount} ${ENV_ROOT}/bin/
# Copy CUDA libraries and binaries (for NVIDIA GPUs)
if [ -d "/usr/local/cuda" ]; then
cp -r /usr/local/cuda/bin/* ${ENV_ROOT}/usr/local/cuda/bin/
cp -r /usr/local/cuda/lib64/* ${ENV_ROOT}/usr/local/cuda/lib64/
cp -r /usr/local/cuda/include ${ENV_ROOT}/usr/local/cuda/
# Create CUDA symbolic links
ln -s /usr/local/cuda/lib64/libcudart.so ${ENV_ROOT}/usr/lib64/
ln -s /usr/local/cuda/lib64/libcublas.so ${ENV_ROOT}/usr/lib64/
ln -s /usr/local/cuda/lib64/libcudnn.so ${ENV_ROOT}/usr/lib64/
fi
# Copy NVIDIA driver libraries
for lib in /usr/lib/x86_64-linux-gnu/libnvidia-*.so*; do
if [ -f "$lib" ]; then
mkdir -p ${ENV_ROOT}/usr/lib/x86_64-linux-gnu/
cp $lib ${ENV_ROOT}/usr/lib/x86_64-linux-gnu/
fi
done
# Copy required Python with ML libraries (if using PyTorch/TensorFlow)
if [ -d "/usr/local/lib/python3.9" ]; then
mkdir -p ${ENV_ROOT}/usr/local/lib/python3.9
cp -r /usr/local/lib/python3.9/dist-packages ${ENV_ROOT}/usr/local/lib/python3.9/
cp /usr/local/bin/python3.9 ${ENV_ROOT}/usr/local/bin/
ln -s /usr/local/bin/python3.9 ${ENV_ROOT}/usr/local/bin/python
fi
# Copy dependencies for typical ML frameworks
for lib in $(find /usr/lib /lib -name "libgomp*.so*" -o -name "libnuma*.so*" -o -name "libcudnn*.so*" 2>/dev/null); do
if [ -f "$lib" ]; then
mkdir -p ${ENV_ROOT}/$(dirname $lib)
cp $lib ${ENV_ROOT}/$lib
fi
done
# Create configuration for GPU environment
cat > ${ENV_ROOT}/etc/environment <<EOF
PATH=/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/bin
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/lib64:/usr/lib:/lib
CUDA_VISIBLE_DEVICES=0
NVIDIA_VISIBLE_DEVICES=all
NVIDIA_DRIVER_CAPABILITIES=compute,utility
EOF
# Create sample GPU test script
cat > ${ENV_ROOT}/opt/gpu-test.py <<EOF
#!/usr/bin/env python
import torch
print("CUDA Available:", torch.cuda.is_available())
if torch.cuda.is_available():
print("CUDA Device Count:", torch.cuda.device_count())
print("CUDA Device Name:", torch.cuda.get_device_name(0))
# Simple tensor operation on GPU
x = torch.rand(5, 3).cuda()
print("GPU Tensor:", x)
EOF
chmod +x ${ENV_ROOT}/opt/gpu-test.py
fi
# Prepare persistent data directories
mkdir -p ${DATA_ROOT}/{logs,models,datasets}
# Setup resource limits with cgroups
mkdir -p /sys/fs/cgroup/cpu/${ENV_NAME}
mkdir -p /sys/fs/cgroup/memory/${ENV_NAME}
# Limit to 90% of system memory but prioritize GPU workloads
TOTAL_MEM=$(free -b | grep Mem | awk '{print $2}')
GPU_MEM_LIMIT=$(echo "$TOTAL_MEM * 0.9" | bc | cut -d. -f1)
echo $GPU_MEM_LIMIT > /sys/fs/cgroup/memory/${ENV_NAME}/memory.limit_in_bytes
echo 800000 > /sys/fs/cgroup/cpu/${ENV_NAME}/cpu.cfs_quota_us # 800% = 8 cores
echo 100000 > /sys/fs/cgroup/cpu/${ENV_NAME}/cpu.cfs_period_us
# Create bind mounts for persistent data
mount --bind ${DATA_ROOT}/logs ${ENV_ROOT}/var/log
mount --bind ${DATA_ROOT}/models ${ENV_ROOT}/opt/ml/model
mount --bind ${DATA_ROOT}/datasets ${ENV_ROOT}/opt/ml/input
# Mount required special filesystems
mount -t proc proc ${ENV_ROOT}/proc
mount -t sysfs sysfs ${ENV_ROOT}/sys
# Special handling for GPU devices
# Create GPU device nodes in the environment
mkdir -p ${ENV_ROOT}/dev/nvidia
# Find and create all NVIDIA device nodes
for i in $(find /dev -name "nvidia*" -type c); do
DEVNAME=$(basename $i)
MAJOR=$(stat -c "%t" $i | sed 's/^0*//' | tr -d '\n')
MINOR=$(stat -c "%T" $i | sed 's/^0*//' | tr -d '\n')
if [ -z "$MAJOR" ]; then MAJOR="0"; fi
if [ -z "$MINOR" ]; then MINOR="0"; fi
MAJOR=$(printf "%d" 0x$MAJOR)
MINOR=$(printf "%d" 0x$MINOR)
mknod ${ENV_ROOT}/dev/$DEVNAME c $MAJOR $MINOR
chmod 666 ${ENV_ROOT}/dev/$DEVNAME
done
# Start GPU environment
systemd-run --unit=${ENV_NAME} --slice=specialized \
--property=CPUQuota=800% \
--property=IOWeight=800 \
--property=ExecStart="/opt/specialized-environments/run-isolated.sh ${ENV_NAME} /bin/bash -c 'source /etc/environment && python /opt/gpu-test.py && sleep infinity'" \
--property=Restart=always
# Port forwarding for services (e.g., Jupyter or ML server)
iptables -t nat -A PREROUTING -p tcp --dport 8888 -j DNAT --to-destination 10.200.0.11:8888
echo "${ENV_NAME} environment setup complete"
}
CPU-Intensive Environment Setup
The following script creates an isolated environment optimized for CPU-intensive tasks:
#!/bin/bash
# Part of /etc/specialized-environments/cpu-compute.conf
setup_cpu_environment() {
local ENV_NAME="cpu-compute"
local ENV_ROOT="/var/lib/environments/${ENV_NAME}"
local DATA_ROOT="/var/lib/environment-data/${ENV_NAME}"
echo "Setting up ${ENV_NAME} environment..."
# Create network namespace
ip netns add ${ENV_NAME}
# Create veth pair for networking
ip link add veth-${ENV_NAME} type veth peer name veth0
ip link set veth-${ENV_NAME} up
ip link set veth0 netns ${ENV_NAME}
# Configure networking
ip addr add 10.200.0.20/24 dev veth-${ENV_NAME}
ip netns exec ${ENV_NAME} ip addr add 10.200.0.21/24 dev veth0
ip netns exec ${ENV_NAME} ip link set veth0 up
ip netns exec ${ENV_NAME} ip link set lo up
ip netns exec ${ENV_NAME} ip route add default via 10.200.0.20
# Prepare filesystem structure for CPU-intensive environment
if [ ! -d "${ENV_ROOT}/usr/local/bin" ]; then
# Create minimal filesystem with CPU optimization tools
mkdir -p ${ENV_ROOT}/{bin,sbin,lib,lib64,usr,etc,var,tmp,proc,sys,dev,run,opt}
mkdir -p ${ENV_ROOT}/usr/{bin,sbin,lib,lib64,local,share}
mkdir -p ${ENV_ROOT}/usr/local/{bin,lib64,include}
mkdir -p ${ENV_ROOT}/var/{log,tmp}
mkdir -p ${ENV_ROOT}/opt/data/{input,output}
# Copy essential binaries
cp /bin/{bash,ls,mkdir,cp,rm,mount} ${ENV_ROOT}/bin/
# Copy CPU optimization libraries (OpenMP, MPI, etc.)
for lib in $(find /usr/lib /lib -name "libgomp*.so*" -o -name "libopenmpi*.so*" -o -name "libmpi*.so*" -o -name "libomp*.so*" -o -name "libnuma*.so*" 2>/dev/null); do
if [ -f "$lib" ]; then
mkdir -p ${ENV_ROOT}/$(dirname $lib)
cp $lib ${ENV_ROOT}/$lib
fi
done
# Copy high-performance computing tools
if [ -f "/usr/bin/gcc" ]; then cp /usr/bin/gcc ${ENV_ROOT}/usr/bin/; fi
if [ -f "/usr/bin/g++" ]; then cp /usr/bin/g++ ${ENV_ROOT}/usr/bin/; fi
if [ -f "/usr/bin/make" ]; then cp /usr/bin/make ${ENV_ROOT}/usr/bin/; fi
if [ -f "/usr/bin/cmake" ]; then cp /usr/bin/cmake ${ENV_ROOT}/usr/bin/; fi
# Copy OpenBLAS/LAPACK for scientific computing
for lib in $(find /usr/lib /lib -name "libblas*.so*" -o -name "liblapack*.so*" -o -name "libopenblas*.so*" 2>/dev/null); do
if [ -f "$lib" ]; then
mkdir -p ${ENV_ROOT}/$(dirname $lib)
cp $lib ${ENV_ROOT}/$lib
fi
done
# Create configuration for CPU optimization
cat > ${ENV_ROOT}/etc/environment <<EOF
PATH=/usr/local/bin:/usr/bin:/bin
LD_LIBRARY_PATH=/usr/local/lib64:/usr/lib64:/usr/lib:/lib
OMP_NUM_THREADS=16
MKL_NUM_THREADS=16
OPENBLAS_NUM_THREADS=16
VECLIB_MAXIMUM_THREADS=16
NUMEXPR_NUM_THREADS=16
EOF
# Create sample CPU test script
cat > ${ENV_ROOT}/opt/cpu-test.c <<EOF
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#include <time.h>
#define SIZE 2000
#define ITERATIONS 5
void matrix_multiply(double *A, double *B, double *C, int n) {
#pragma omp parallel for
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
double sum = 0.0;
for (int k = 0; k < n; k++) {
sum += A[i*n+k] * B[k*n+j];
}
C[i*n+j] = sum;
}
}
}
int main() {
double *A = (double*)malloc(SIZE*SIZE*sizeof(double));
double *B = (double*)malloc(SIZE*SIZE*sizeof(double));
double *C = (double*)malloc(SIZE*SIZE*sizeof(double));
// Initialize matrices
for (int i = 0; i < SIZE*SIZE; i++) {
A[i] = (double)rand() / RAND_MAX;
B[i] = (double)rand() / RAND_MAX;
}
printf("Running with %d OpenMP threads\n", omp_get_max_threads());
clock_t start = clock();
// Perform matrix multiplications
for (int i = 0; i < ITERATIONS; i++) {
matrix_multiply(A, B, C, SIZE);
}
clock_t end = clock();
double time_spent = (double)(end - start) / CLOCKS_PER_SEC;
printf("Completed %d iterations of %dx%d matrix multiplication in %.2f seconds\n",
ITERATIONS, SIZE, SIZE, time_spent);
free(A);
free(B);
free(C);
return 0;
}
EOF
# Compile the CPU test
cat > ${ENV_ROOT}/opt/compile-test.sh <<EOF
#!/bin/bash
cd /opt
gcc -fopenmp -O3 cpu-test.c -o cpu-benchmark
EOF
chmod +x ${ENV_ROOT}/opt/compile-test.sh
fi
# Prepare persistent data directories
mkdir -p ${DATA_ROOT}/{logs,data,results}
# Setup resource limits with cgroups
mkdir -p /sys/fs/cgroup/cpu/${ENV_NAME}
mkdir -p /sys/fs/cgroup/memory/${ENV_NAME}
mkdir -p /sys/fs/cgroup/cpuset/${ENV_NAME}
# Get CPU count for dedicated cores
CPU_COUNT=$(nproc)
DEDICATED_CORES=$(($CPU_COUNT - 2)) # Reserve 2 cores for system
# Allocate specific CPU cores (e.g., cores 2-15 if you have 16 cores)
echo "2-$DEDICATED_CORES" > /sys/fs/cgroup/cpuset/${ENV_NAME}/cpuset.cpus
echo "0-1" > /sys/fs/cgroup/cpuset/${ENV_NAME}/cpuset.mems
# Set memory limits
TOTAL_MEM=$(free -b | grep Mem | awk '{print $2}')
CPU_MEM_LIMIT=$(echo "$TOTAL_MEM * 0.7" | bc | cut -d. -f1) # 70% of system memory
echo $CPU_MEM_LIMIT > /sys/fs/cgroup/memory/${ENV_NAME}/memory.limit_in_bytes
# Set CPU scheduling priority
echo 1500000 > /sys/fs/cgroup/cpu/${ENV_NAME}/cpu.cfs_quota_us # 1500% = 15 cores
echo 100000 > /sys/fs/cgroup/cpu/${ENV_NAME}/cpu.cfs_period_us
# Create bind mounts for persistent data
mount --bind ${DATA_ROOT}/logs ${ENV_ROOT}/var/log
mount --bind ${DATA_ROOT}/data ${ENV_ROOT}/opt/data/input
mount --bind ${DATA_ROOT}/results ${ENV_ROOT}/opt/data/output
# Mount required special filesystems
mount -t proc proc ${ENV_ROOT}/proc
mount -t sysfs sysfs ${ENV_ROOT}/sys
# Start CPU environment
systemd-run --unit=${ENV_NAME} --slice=specialized \
--property=CPUQuota=1500% \
--property=CPUAffinity=2-${DEDICATED_CORES} \
--property=IOWeight=900 \
--property=ExecStart="/opt/specialized-environments/run-isolated.sh ${ENV_NAME} /bin/bash -c 'source /etc/environment && /opt/compile-test.sh && /opt/cpu-benchmark && sleep infinity'" \
--property=Restart=always
# Port forwarding for services
iptables -t nat -A PREROUTING -p tcp --dport 9090 -j DNAT --to-destination 10.200.0.21:9090
echo "${ENV_NAME} environment setup complete"
}
Execution Script for Isolated Environments
This script handles the namespace isolation for both environments:
#!/bin/bash
# /opt/specialized-environments/run-isolated.sh
ENV_NAME="$1"
shift
COMMAND="$@"
ENV_ROOT="/var/lib/environments/${ENV_NAME}"
# Enter network namespace first
ip netns exec ${ENV_NAME} unshare --mount --uts --ipc --pid --fork \
chroot ${ENV_ROOT} /bin/bash -c "mount -t proc proc /proc && mount -t sysfs sysfs /sys && exec ${COMMAND}"
Performance Monitoring System
Create a monitoring script to track resource utilization and performance:
#!/bin/bash
# /opt/specialized-environments/monitor.sh
# Configuration
CHECK_INTERVAL=30
LOG_DIR="/var/log/specialized-environments"
mkdir -p ${LOG_DIR}
# GPU monitoring function
monitor_gpu_environment() {
# Log timestamp
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
# Get GPU stats using nvidia-smi
if command -v nvidia-smi &> /dev/null; then
GPU_UTIL=$(nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits)
GPU_MEM=$(nvidia-smi --query-gpu=memory.used --format=csv,noheader)
GPU_TEMP=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)
echo "$TIMESTAMP,GPU,$GPU_UTIL%,$GPU_MEM,$GPU_TEMP°C" >> ${LOG_DIR}/gpu-metrics.csv
fi
# Check if environment is running
systemctl is-active --quiet gpu-compute
if [ $? -ne 0 ]; then
echo "$TIMESTAMP: GPU environment not running, restarting..." >> ${LOG_DIR}/gpu-events.log
systemctl restart gpu-compute
fi
}
# CPU monitoring function
monitor_cpu_environment() {
# Log timestamp
TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")
# Get CPU stats for the dedicated cores
CPU_UTIL=$(top -b -n1 | grep "Cpu(s)" | awk '{print $2 + $4}')
MEM_UTIL=$(free | grep Mem | awk '{print $3/$2 * 100.0}')
echo "$TIMESTAMP,CPU,$CPU_UTIL%,$MEM_UTIL%" >> ${LOG_DIR}/cpu-metrics.csv
# Check if environment is running
systemctl is-active --quiet cpu-compute
if [ $? -ne 0 ]; then
echo "$TIMESTAMP: CPU environment not running, restarting..." >> ${LOG_DIR}/cpu-events.log
systemctl restart cpu-compute
fi
}
# Create header for CSV files if they don't exist
if [ ! -f ${LOG_DIR}/gpu-metrics.csv ]; then
echo "Timestamp,Type,Utilization,Memory,Temperature" > ${LOG_DIR}/gpu-metrics.csv
fi
if [ ! -f ${LOG_DIR}/cpu-metrics.csv ]; then
echo "Timestamp,Type,CPU_Utilization,Memory_Utilization" > ${LOG_DIR}/cpu-metrics.csv
fi
# Main monitoring loop
while true; do
monitor_gpu_environment
monitor_cpu_environment
sleep ${CHECK_INTERVAL}
done
Create a systemd service for the monitoring system:
# /etc/systemd/system/environment-monitor.service
[Unit]
Description=Specialized Environment Monitoring
After=network.target
[Service]
Type=simple
ExecStart=/opt/specialized-environments/monitor.sh
Restart=always
[Install]
WantedBy=multi-user.target
Boot-time Integration
Ensure the environments start at boot time:
# /etc/systemd/system/specialized-environments.service
[Unit]
Description=Specialized Computing Environments
After=network.target
Before=gpu-compute.service cpu-compute.service
[Service]
Type=oneshot
ExecStart=/opt/specialized-environments/init.sh
RemainAfterExit=true
ExecStop=/opt/specialized-environments/cleanup.sh
[Install]
WantedBy=multi-user.target
Environment Cleanup Script
For proper teardown of the environments:
#!/bin/bash
# /opt/specialized-environments/cleanup.sh
# Stop services
systemctl stop gpu-compute cpu-compute
# Unmount filesystems
for ENV_NAME in gpu-compute cpu-compute; do
ENV_ROOT="/var/lib/environments/${ENV_NAME}"
umount ${ENV_ROOT}/proc
umount ${ENV_ROOT}/sys
# Unmount environment-specific mounts
if [ "${ENV_NAME}" == "gpu-compute" ]; then
umount ${ENV_ROOT}/var/log
umount ${ENV_ROOT}/opt/ml/model
umount ${ENV_ROOT}/opt/ml/input
else
umount ${ENV_ROOT}/var/log
umount ${ENV_ROOT}/opt/data/input
umount ${ENV_ROOT}/opt/data/output
fi
done
# Remove network namespaces
ip netns del gpu-compute
ip netns del cpu-compute
# Remove veth interfaces
ip link del veth-gpu-compute
ip link del veth-cpu-compute
# Remove bridge
ip link set compute-bridge down
ip link del compute-bridge
# Clean up iptables rules
iptables -t nat -F
# Reset CPU governor to balanced
for governor in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
echo ondemand > $governor
done
Submitting Jobs to Specialized Environments
Create scripts to easily submit jobs to each environment:
GPU Job Submission
#!/bin/bash
# /usr/local/bin/submit-gpu-job
if [ $# -lt 1 ]; then
echo "Usage: $0 <script.py> [args...]"
exit 1
fi
SCRIPT="$1"
shift
ARGS="$@"
SCRIPT_NAME=$(basename "$SCRIPT")
# Copy script to GPU environment
cp "$SCRIPT" /var/lib/environment-data/gpu-compute/models/
# Run the script in the GPU environment
systemd-run --unit=gpu-job-$(date +%s) --slice=specialized \
--property=CPUQuota=800% \
/opt/specialized-environments/run-isolated.sh gpu-compute \
/bin/bash -c "source /etc/environment && cd /opt/ml/model && python ${SCRIPT_NAME} ${ARGS} > /var/log/job-$(date +%s).log 2>&1"
echo "Job submitted to GPU environment"
CPU Job Submission
#!/bin/bash
# /usr/local/bin/submit-cpu-job
if [ $# -lt 1 ]; then
echo "Usage: $0 <executable> [args...]"
exit 1
fi
EXEC="$1"
shift
ARGS="$@"
EXEC_NAME=$(basename "$EXEC")
# Copy executable to CPU environment
cp "$EXEC" /var/lib/environment-data/cpu-compute/data/
# Run the executable in the CPU environment
systemd-run --unit=cpu-job-$(date +%s) --slice=specialized \
--property=CPUQuota=1500% \
--property=CPUAffinity=2-15 \
/opt/specialized-environments/run-isolated.sh cpu-compute \
/bin/bash -c "source /etc/environment && cd /opt/data/input && ./${EXEC_NAME} ${ARGS} > /var/log/job-$(date +%s).log 2>&1"
echo "Job submitted to CPU environment"
Resource Efficiency Compared to Kubernetes
The static provisioning approach provides several advantages over Kubernetes for specialized workloads:
Direct hardware access: GPU passthrough is simpler with direct namespace isolation
Reduced context switching: Dedicated CPU pinning eliminates scheduling overhead
Lower memory overhead: No container runtime or orchestration overhead
Optimized libraries: Environment contains only the necessary libraries for each workload type
Streamlined I/O paths: Direct device access without abstraction layers
Security Benefits
Reduced attack surface: No container runtime exploits or Kubernetes API vulnerabilities
Isolated device access: Direct control over which devices are exposed to each environment
Simplified privilege model: No complex RBAC or container security contexts
Resource boundaries: Hard cgroup limits prevent resource starvation between environments
Practical Use Cases
This approach is particularly well-suited for:
Machine learning training: GPU-optimized environment for frameworks like TensorFlow/PyTorch
Scientific computing: CPU-optimized environment for simulations and data analysis
Rendering farms: Predictable GPU resource allocation for graphics workloads
High-performance databases: CPU-isolated environment for database operations
Signal processing: Real-time processing with dedicated CPU resources
Conclusion
For specialized GPU and CPU-intensive workloads, this static provisioning approach offers significant advantages in terms of performance, resource utilization, and simplicity compared to container orchestration platforms. By leveraging Linux's native isolation capabilities and optimizing each environment for its specific computational needs, this solution provides a lightweight yet powerful alternative for organizations that need to maximize the performance of specialized hardware resources without the overhead of container orchestration.
While this approach requires more initial setup and customization than deploying containers on Kubernetes, it offers greater control and efficiency for stable, long-running specialized workloads where direct hardware access and performance are critical factors.
Subscribe to my newsletter
Read articles from Dove-Wing directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
