Comprehensive and Advanced Kubernetes Pod Deployment Process
Table of contents
- 1. YAML Creation and Kubectl Command
- 2. YAML to JSON Serialization
- 3. kubectl to API Server Communication
- 4. API Server Processing
- 5. API Server to etcd Communication
- 6. etcd Storage
- 7. API Server Watch Mechanism
- 8. Controller Manager Operation
- 9. Scheduler Operation
- 10. Kubelet Operation
- 11. Status Updates
- 12. Resource Version and Optimistic Concurrency
- 13. Finalizers and Garbage Collection
- Key Concepts Explained:
1. YAML Creation and Kubectl Command
We start with a YAML file defining a Deployment with 3 replicas:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
spec:
replicas: 3
selector:
matchLabels:
app: example
template:
metadata:
labels:
app: example
spec:
containers:
- name: example-container
image: nginx:1.14.2
The user runs:
kubectl apply -f deployment.yaml
2. YAML to JSON Serialization
kubectl converts YAML to JSON. This process is called serialization - converting a data structure into a format that can be stored or transmitted.
Example of the JSON:
{
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {
"name": "example-deployment"
},
"spec": {
"replicas": 3,
"selector": {
"matchLabels": {
"app": "example"
}
},
"template": {
"metadata": {
"labels": {
"app": "example"
}
},
"spec": {
"containers": [
{
"name": "example-container",
"image": "nginx:1.14.2"
}
]
}
}
}
}
3. kubectl to API Server Communication
kubectl sends this JSON to the API server using a RESTful HTTP POST request. The communication is secured using TLS.
4. API Server Processing
4.1 Admission Controllers
Before the API server processes the request, it goes through a series of admission controllers. These are plugins that intercept requests to the Kubernetes API server prior to persistence of the object.
Example admission controllers:
ValidatingAdmissionWebhook
MutatingAdmissionWebhook
ResourceQuota
LimitRanger
Code snippet for registering an admission controller:
func (p *Plugin) ValidateInitialization() error {
if p.configFile != "" {
config, err := loadConfig(p.configFile)
if err != nil {
return fmt.Errorf("failed to load config: %v", err)
}
p.config = config
}
return nil
}
4.2 Authentication and Authorization
The API server authenticates the request using one or more authenticator modules. It then authorizes the request using RBAC (Role-Based Access Control) or other authorization modules.
RBAC example:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: default
name: deployment-creator
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["create", "update", "patch", "delete"]
4.3 Request Processing
The API Server, written in Go, receives the JSON and deserializes it into Go structs. Deserialization is the reverse process of serialization, converting a data format back into a data structure.
Example Go struct (simplified):
type Deployment struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec DeploymentSpec `json:"spec,omitempty"`
Status DeploymentStatus `json:"status,omitempty"`
}
5. API Server to etcd Communication
The API server needs to store this information in etcd. It uses Protocol Buffers (protobuf) for this communication.
Protocol Buffers: A method for serializing structured data developed by Google. It's more efficient than JSON for machine-to-machine communication.
Example .proto file (simplified):
syntax = "proto3";
message Deployment {
string api_version = 1;
string kind = 2;
ObjectMeta metadata = 3;
DeploymentSpec spec = 4;
DeploymentStatus status = 5;
}
The API server uses gRPC to communicate with etcd. gRPC (gRPC Remote Procedure Call) is a high-performance, open-source universal RPC framework developed by Google.
6. etcd Storage
etcd doesn't store the data as Protocol Buffers. It uses its own binary format based on BoltDB.
BoltDB: An embedded key/value database for Go. It uses a B+tree structure for efficient storage and retrieval.
etcd's storage format (conceptual, not actual code):
Key: /registry/deployments/default/example-deployment
Value: <binary data>
The binary data is a serialized representation of the Kubernetes object.
6.1 Etcd Consistency and Raft Protocol
etcd uses the Raft consensus algorithm to maintain consistency across its cluster. This ensures that all etcd nodes agree on the state of the system.
Raft pseudocode (simplified):
while true:
switch state
case follower:
listen for heartbeats from leader
if election timeout elapses:
become candidate
case candidate:
increment term
vote for self
request votes from other nodes
if receive majority of votes:
become leader
else if receive heartbeat from valid leader:
become follower
case leader:
send heartbeats to all other nodes
replicate log entries to followers
7. API Server Watch Mechanism
The API server uses a watch mechanism to efficiently notify clients of changes. This is implemented using HTTP long polling or websockets.
Example Go code for creating a watch:
watch, err := client.AppsV1().Deployments(namespace).Watch(context.TODO(), metav1.ListOptions{
Watch: true,
FieldSelector: fields.OneTermEqualSelector("metadata.name", deploymentName).String(),
})
if err != nil {
panic(err)
}
for event := range watch.ResultChan() {
deployment := event.Object.(*appsv1.Deployment)
fmt.Printf("Deployment %s has been %s\n", deployment.Name, event.Type)
}
8. Controller Manager Operation
The Deployment controller in the controller manager uses a "lister-watcher" pattern to observe changes.
Lister-Watcher: A Go interface that allows for efficient watching of Kubernetes resources.
type ListerWatcher interface {
List(options metav1.ListOptions) (runtime.Object, error)
Watch(options metav1.ListOptions) (watch.Interface, error)
}
When the controller notices the new Deployment, it creates a ReplicaSet.
8.1 Controller Reconciliation Loop
Controllers use a reconciliation loop to continuously move the current state towards the desired state. This is implemented using work queues and multiple worker goroutines.
Simplified reconciliation loop:
func (c *Controller) worker() {
for c.processNextWorkItem() {
}
}
func (c *Controller) processNextWorkItem() bool {
key, quit := c.queue.Get()
if quit {
return false
}
defer c.queue.Done(key)
err := c.syncHandler(key.(string))
if err == nil {
c.queue.Forget(key)
return true
}
c.queue.AddRateLimited(key)
return true
}
9. Scheduler Operation
The scheduler also uses the lister-watcher pattern. When it notices unscheduled pods, it runs its scheduling algorithm to decide which nodes should run the pods.
9.1 Scheduler Extender
Kubernetes allows extending the scheduler's functionality using a Scheduler Extender. This is an HTTP/HTTPS endpoint that can be called by the scheduler to filter or prioritize nodes.
Example scheduler extender configuration:
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
extenders:
- urlPrefix: "http://localhost:8888/"
filterVerb: "filter"
prioritizeVerb: "prioritize"
weight: 1
enableHTTPS: false
10. Kubelet Operation
Kubelets on the chosen nodes retrieve pod specifications from the API server using gRPC and Protocol Buffers.
They then use the Container Runtime Interface (CRI) to instruct the container runtime (e.g., Docker) to run the containers.
10.1 CRI (Container Runtime Interface) Details
The CRI uses Protocol Buffers and gRPC to define the interface between the kubelet and container runtime. Here's a more detailed look at some CRI methods:
service RuntimeService {
rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
// ... other methods
}
service ImageService {
rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
// ... other methods
}
10.2 CNI (Container Network Interface)
The CNI is used by Kubernetes to set up networking for containers. It involves plugins that configure network interfaces in container network namespaces.
Example CNI configuration:
{
"cniVersion": "0.4.0",
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.22.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
11. Status Updates
As pods are created and become ready, kubelets update their status. This information flows back through the system:
Kubelet → API Server → etcd
Each step involves serialization to Protocol Buffers, gRPC communication, and deserialization.
12. Resource Version and Optimistic Concurrency
Kubernetes uses resource versions for optimistic concurrency control. Each object in etcd has a resource version that's updated on every modification.
Example of using resource version in a patch operation:
patchBytes, err := json.Marshal(map[string]interface{}{
"metadata": map[string]interface{}{
"labels": map[string]string{
"new-label": "new-value",
},
},
})
if err != nil {
panic(err)
}
_, err = client.AppsV1().Deployments(namespace).Patch(context.TODO(), deploymentName, types.StrategicMergePatchType, patchBytes, metav1.PatchOptions{}, "metadata.resourceVersion=1234")
if err != nil {
panic(err)
}
13. Finalizers and Garbage Collection
Kubernetes uses finalizers to implement pre-delete hooks. They're part of the object's metadata and prevent the object from being deleted until specific conditions are met.
Example of adding a finalizer:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-deployment
finalizers:
- kubernetes.io/pv-protection
spec:
# ... rest of the deployment spec
Key Concepts Explained:
Serialization: Converting data structures or objects into a format that can be stored or transmitted.
Deserialization: Converting serialized data back into its original data structure or object.
JSON (JavaScript Object Notation): A lightweight data interchange format that's easy for humans to read and write.
Protocol Buffers: A method of serializing structured data that's more efficient than JSON for machine-to-machine communication.
gRPC: A high-performance, open-source universal RPC (Remote Procedure Call) framework.
etcd: A distributed key-value store used as Kubernetes' backing store for all cluster data.
BoltDB: An embedded key/value database for Go, used by etcd for its storage engine.
Go: The programming language used to write most of Kubernetes' components.
Lister-Watcher: A pattern in Kubernetes for efficiently observing and reacting to changes in resources.
Container Runtime Interface (CRI): A plugin interface which enables kubelet to use a variety of container runtimes.
Container Network Interface (CNI): A specification and libraries for configuring network interfaces in Linux containers.
RBAC (Role-Based Access Control): A method of regulating access to computer or network resources based on the roles of individual users.
Admission Controllers: Plugins that intercept requests to the Kubernetes API server prior to the persistence of the object.
Raft Consensus Algorithm: A protocol for implementing distributed consensus, used by etcd for consistency.
Reconciliation Loop: A control loop in Kubernetes controllers that continuously works to make the current state match the desired state.
Scheduler Extender: A way to add new scheduling rules to Kubernetes without modifying the scheduler code.
Optimistic Concurrency Control: A method used in Kubernetes to handle concurrent resource modifications.
Finalizers: A way to ensure that certain cleanup operations are performed before a resource is deleted.
This comprehensive explanation covers the entire process of deploying a pod in Kubernetes, from the initial YAML file to the final container creation and beyond. It includes details on data formats, communication protocols, storage mechanisms, and the roles of various Kubernetes components, as well as advanced concepts like admission controllers, RBAC, scheduler extenders, and finalizers. The inclusion of code snippets and configuration examples provides a practical context for these concepts.
This level of detail would be valuable for Kubernetes administrators, developers working on Kubernetes itself, or anyone looking to gain a deep technical understanding of the system. It demonstrates how Kubernetes handles challenges in distributed systems, concurrency, networking, and resource management, showcasing the depth and complexity of its architecture.
Acknowledgments
I’d like to extend my gratitude to ClaudeAI for providing valuable insights and detailed explanations that enriched this article. The assistance was instrumental in crafting a comprehensive overview of Kubernetes processes.
Subscribe to my newsletter
Read articles from Hari Kiran B directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by