Kubernetes Pods Resource Sizing


When deploying applications to Kubernetes, correctly sizing the resources for our pods is crucial for achieving optimal performance and efficient resource utilization. This involves setting the right CPU and memory requests and limits based on our application's needs. In this post, we will share an approach to help define these parameters accurately when the application is based on requests.
Define Application Requirements
The first step is to define the goals our application aims to achieve. These are usually one or more of the following:
Response Time: The time it takes for the API to respond to a request.
Throughput: The number of requests the API can handle per second.
Error Rate: The percentage of failed requests.
Resource Utilization: CPU and memory usage on the server.
For our practical example, we will use:
95% of the requests must have a response time of less than 200 milliseconds.
We need to handle 50 requests per second.
The error rate must be less than 1% of the requests.
The CPU utilization must be less than 90%.
The memory utilization must be less than 50%.
Set Up the Environment
We need an environment that closely resembles production, which is a Kubernetes cluster. Additionally, we need to set up monitoring tools to gather performance metrics, such as CPU and memory usage of our pods.
In this example, we chose the standalone version of Kubernetes included in Docker Desktop. To monitor the cluster, we installed the kube-prometheus stack, which includes components like Prometheus and Grafana, among others. The easiest way to install it is by using Helm:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
Once installed, we can access Grafana locally by using a simple port forwarding command:
kubectl port-forward svc/prometheus-grafana 3000:80 -n monitoring
Open http://localhost:3000/login in a browser, and use admin
as the username and prom-operator
as the password.
Select the Load Testing Tool
Choose a load-testing tool that fits your needs. Many options are available, but we recommend using K6 due to its developer-centric approach.
Set Initial Resources
Choose initial CPU and memory resources based on your experience; this will be the starting point for the analysis. In our case, we selected 128m
for CPU and 256Mi
for memory.
Deploy the Application
We built an API to calculate Pi using the Leibniz series to test this approach. Download the code here, and create the image with the following command:
docker build -t raulnq/mywebapi:1.0 -f .\MyWebApi\Dockerfile .
The resources.yaml
file defines a deployment
with the requested resources and a service
using the NodePort
type:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-deployment
labels:
app: api
spec:
replicas: 1
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api-container
image: raulnq/mywebapi:2.0
resources:
requests:
memory: "128Mi"
cpu: "256m"
limits:
memory: "128Mi"
cpu: "256m"
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: api-service
labels:
app: api
spec:
type: NodePort
ports:
- port: 80
targetPort: 8080
protocol: TCP
nodePort: 30007
selector:
app: api
Deploy the application with the following command:
kubectl apply -f resources.yaml
Define the Load Tests
The next step is to identify the endpoints to be tested. In some cases, we can define user journeys and use fake input data to mimic real-world usage. The load.js
file defines our test:
import http from 'k6/http';
import { sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 10 }, // ramp-up
{ duration: '2m', target: 10 }, // stay
{ duration: '1m', target: 0 }, // ramp-down
],
thresholds: {
http_req_duration: ['p(95)<200'], // 95% of requests must complete below 200ms
http_req_failed: ['rate<0.01'], // Error rate should be less than 1%
},
};
export default function () {
http.get('http://localhost:30007/pi?iterations=5000000');
sleep(1);
}
The test above specifies three stages:
One minute to increase the load from zero to the target VU (virtual users).
Two minutes maintaining the target load.
One minute to decrease the load from the target to zero VU.
In addition, we are setting thresholds to determine when our tests exceed our goals. You can run the script using the following command:
k6 run load.js
Run the Load Tests and Adjust Resources
Run an initial test to understand the performance under minimal load. Then, based on the results, we will gradually increase the load for the next run, updating the resources if necessary until we achieve the goal defined in the first step. The results of the first run are as follows (data collected using k6 and Grafana):
Metric | Value |
P95 | 19.77 ms |
Requests per second | 7.48 |
CPU | 0.066 Mi |
% CPU usage | 51.56 |
Memory | 41.9 mb |
% memory usage | 16.36 |
% error | 0 |
For 10 VUs, we met all our goals except for requests per second. We only achieved 7.48, which is far from the target of 50. So, let's increase the VUs to 70 and run the test:
Metric | Value |
P95 | 4980 ms |
Requests per second | 16.43 |
CPU | 0.125 Mi |
% CPU usage | 97.65 |
Memory | 54.3 mb |
% memory usage | 21.21 |
% error | 0 |
From these results, we can conclude that we need to adjust the amount of CPU requested due to the high CPU usage and the slow response time we are experiencing with this setup. Let's modify the CPU request to 256m
and the memory request to 128Mi
, redeploy the application, and run the test:
Metric | Value |
P95 | 1090 ms |
Requests per second | 33.19 |
CPU | 0.253 Mi |
% CPU usage | 98.82 |
Memory | 46.3 mb |
% memory usage | 36.17 |
% error | 0 |
Closer to the goals but not there yet, modify the CPU request to 512m
, redeploy the application, and run the test:
Metric | Value |
P95 | 89.17 ms |
Requests per second | 51.10 |
CPU | 0.449 Mi |
% CPU usage | 87.69 |
Memory | 45.7 mb |
% memory usage | 35.70 |
% error | 0 |
It looks like this final setup meets all our goals.
When Does It Break?
So far, we have a viable resource setup for our pods, but it's good to know when it will start to fail. To find out, we will stress the API beyond 50 requests per second, using 80, 90, and 100 VUs to see what happens:
Metric | Value (80 VUs) | Value (90 VUs) | Value (100 VUs) |
P95 | 107.74 ms | 127.63 ms | 204.65 ms |
Requests per second | 57.30 | 64.22 | 68.01 |
CPU | 0.475 Mi | 0.491 Mi | 0.506 Mi |
% CPU usage | 92.77 | 95.89 | 98.82 |
Memory | 46 mb | 49.1 | 49.5 |
% memory usage | 35.93 | 38.35 | 38.67 |
% error | 0 | 0 | 0 |
We can conclude that starting around 57 requests per second, we still respond within the expected time, but the CPU utilization exceeds 90%. At around 68 requests per second, we no longer achieve the expected response time, and the CPU is at its limit.
By defining application requirements, setting up a realistic testing environment, and using appropriate load testing-tools, we can iteratively adjust CPU and memory requests to meet our performance goals. Through systematic testing and monitoring, we can ensure that our application handles the desired load while maintaining acceptable response times, error rates, and resource usage. This approach not only helps in achieving the desired performance but also provides insights into the limits of our setup, allowing for better planning and scaling in production environments. Thank you, and happy coding.
Subscribe to my newsletter
Read articles from Raul Naupari directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Raul Naupari
Raul Naupari
Somebody who likes to code