Deep Dive into Kubernetes Pod Deployment: From YAML to Running Contain

1. YAML Creation and Kubectl Command

We start with a YAML file defining a Deployment with 3 replicas:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: example
  template:
    metadata:
      labels:
        app: example
    spec:
      containers:
      - name: example-container
        image: nginx:1.14.2

The user runs:

kubectl apply -f deployment.yaml

2. YAML to JSON Serialization

kubectl converts YAML to JSON. This process is called serialization - converting a data structure into a format that can be stored or transmitted.

Example of the JSON:

{
  "apiVersion": "apps/v1",
  "kind": "Deployment",
  "metadata": {
    "name": "example-deployment"
  },
  "spec": {
    "replicas": 3,
    "selector": {
      "matchLabels": {
        "app": "example"
      }
    },
    "template": {
      "metadata": {
        "labels": {
          "app": "example"
        }
      },
      "spec": {
        "containers": [
          {
            "name": "example-container",
            "image": "nginx:1.14.2"
          }
        ]
      }
    }
  }
}

3. kubectl to API Server Communication

kubectl sends this JSON to the API server using a RESTful HTTP POST request. The communication is secured using TLS.

4. API Server Processing

4.1 Admission Controllers

Before the API server processes the request, it goes through a series of admission controllers. These are plugins that intercept requests to the Kubernetes API server prior to persistence of the object.

Example admission controllers:

ValidatingAdmissionWebhook
MutatingAdmissionWebhook
ResourceQuota
LimitRanger

Code snippet for registering an admission controller:

func (p *Plugin) ValidateInitialization() error {
    if p.configFile != "" {
        config, err := loadConfig(p.configFile)
        if err != nil {
            return fmt.Errorf("failed to load config: %v", err)
        }
        p.config = config
    }
    return nil
}

4.2 Authentication and Authorization

The API server authenticates the request using one or more authenticator modules. It then authorizes the request using RBAC (Role-Based Access Control) or other authorization modules.

RBAC example:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: deployment-creator
rules:
- apiGroups: ["apps"]
  resources: ["deployments"]
  verbs: ["create", "update", "patch", "delete"]

4.3 Request Processing

The API Server, written in Go, receives the JSON and deserializes it into Go structs. Deserialization is the reverse process of serialization, converting a data format back into a data structure.

Example Go struct (simplified):

type Deployment struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec              DeploymentSpec   `json:"spec,omitempty"`
    Status            DeploymentStatus `json:"status,omitempty"`
}

5. API Server to etcd Communication

The API server needs to store this information in etcd. It uses Protocol Buffers (protobuf) for this communication.

Protocol Buffers: A method for serializing structured data developed by Google. It's more efficient than JSON for machine-to-machine communication.

Example .proto file (simplified):

syntax = "proto3";

message Deployment {
    string api_version = 1;
    string kind = 2;
    ObjectMeta metadata = 3;
    DeploymentSpec spec = 4;
    DeploymentStatus status = 5;
}

The API server uses gRPC to communicate with etcd. gRPC (gRPC Remote Procedure Call) is a high-performance, open-source universal RPC framework developed by Google.

6. etcd Storage

etcd doesn't store the data as Protocol Buffers. It uses its own binary format based on BoltDB.

BoltDB: An embedded key/value database for Go. It uses a B+tree structure for efficient storage and retrieval.

etcd's storage format (conceptual, not actual code):

Key: /registry/deployments/default/example-deployment
Value: <binary data>

The binary data is a serialized representation of the Kubernetes object.

6.1 Etcd Consistency and Raft Protocol

etcd uses the Raft consensus algorithm to maintain consistency across its cluster. This ensures that all etcd nodes agree on the state of the system.

Raft pseudocode (simplified):

while true:
    switch state
    case follower:
        listen for heartbeats from leader
        if election timeout elapses:
            become candidate
    case candidate:
        increment term
        vote for self
        request votes from other nodes
        if receive majority of votes:
            become leader
        else if receive heartbeat from valid leader:
            become follower
    case leader:
        send heartbeats to all other nodes
        replicate log entries to followers

7. API Server Watch Mechanism

The API server uses a watch mechanism to efficiently notify clients of changes. This is implemented using HTTP long polling or websockets.

Example Go code for creating a watch:

watch, err := client.AppsV1().Deployments(namespace).Watch(context.TODO(), metav1.ListOptions{
    Watch: true,
    FieldSelector: fields.OneTermEqualSelector("metadata.name", deploymentName).String(),
})
if err != nil {
    panic(err)
}
for event := range watch.ResultChan() {
    deployment := event.Object.(*appsv1.Deployment)
    fmt.Printf("Deployment %s has been %s\n", deployment.Name, event.Type)
}

8. Controller Manager Operation

The Deployment controller in the controller manager uses a "lister-watcher" pattern to observe changes.

Lister-Watcher: A Go interface that allows for efficient watching of Kubernetes resources.

type ListerWatcher interface {
    List(options metav1.ListOptions) (runtime.Object, error)
    Watch(options metav1.ListOptions) (watch.Interface, error)
}

When the controller notices the new Deployment, it creates a ReplicaSet.

8.1 Controller Reconciliation Loop

Controllers use a reconciliation loop to continuously move the current state towards the desired state. This is implemented using work queues and multiple worker goroutines.

Simplified reconciliation loop:

func (c *Controller) worker() {
    for c.processNextWorkItem() {
    }
}

func (c *Controller) processNextWorkItem() bool {
    key, quit := c.queue.Get()
    if quit {
        return false
    }
    defer c.queue.Done(key)

    err := c.syncHandler(key.(string))
    if err == nil {
        c.queue.Forget(key)
        return true
    }

    c.queue.AddRateLimited(key)
    return true
}

9. Scheduler Operation

The scheduler also uses the lister-watcher pattern. When it notices unscheduled pods, it runs its scheduling algorithm to decide which nodes should run the pods.

9.1 Scheduler Extender

Kubernetes allows extending the scheduler's functionality using a Scheduler Extender. This is an HTTP/HTTPS endpoint that can be called by the scheduler to filter or prioritize nodes.

Example scheduler extender configuration:

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
extenders:
- urlPrefix: "http://localhost:8888/"
  filterVerb: "filter"
  prioritizeVerb: "prioritize"
  weight: 1
  enableHTTPS: false

10. Kubelet Operation

Kubelets on the chosen nodes retrieve pod specifications from the API server using gRPC and Protocol Buffers.

They then use the Container Runtime Interface (CRI) to instruct the container runtime (e.g., Docker) to run the containers.

10.1 CRI (Container Runtime Interface) Details

The CRI uses Protocol Buffers and gRPC to define the interface between the kubelet and container runtime. Here's a more detailed look at some CRI methods:

service RuntimeService {
    rpc CreateContainer(CreateContainerRequest) returns (CreateContainerResponse) {}
    rpc StartContainer(StartContainerRequest) returns (StartContainerResponse) {}
    rpc StopContainer(StopContainerRequest) returns (StopContainerResponse) {}
    rpc RemoveContainer(RemoveContainerRequest) returns (RemoveContainerResponse) {}
    rpc ListContainers(ListContainersRequest) returns (ListContainersResponse) {}
    rpc ContainerStatus(ContainerStatusRequest) returns (ContainerStatusResponse) {}
    // ... other methods
}

service ImageService {
    rpc ListImages(ListImagesRequest) returns (ListImagesResponse) {}
    rpc ImageStatus(ImageStatusRequest) returns (ImageStatusResponse) {}
    rpc PullImage(PullImageRequest) returns (PullImageResponse) {}
    rpc RemoveImage(RemoveImageRequest) returns (RemoveImageResponse) {}
    // ... other methods
}

10.2 CNI (Container Network Interface)

The CNI is used by Kubernetes to set up networking for containers. It involves plugins that configure network interfaces in container network namespaces.

Example CNI configuration:

{
    "cniVersion": "0.4.0",
    "name": "mynet",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "ipam": {
        "type": "host-local",
        "subnet": "10.22.0.0/16",
        "routes": [
            { "dst": "0.0.0.0/0" }
        ]
    }
}

11. Status Updates

As pods are created and become ready, kubelets update their status. This information flows back through the system:

Kubelet → API Server → etcd

Each step involves serialization to Protocol Buffers, gRPC communication, and deserialization.

12. Resource Version and Optimistic Concurrency

Kubernetes uses resource versions for optimistic concurrency control. Each object in etcd has a resource version that's updated on every modification.

Example of using resource version in a patch operation:

patchBytes, err := json.Marshal(map[string]interface{}{
    "metadata": map[string]interface{}{
        "labels": map[string]string{
            "new-label": "new-value",
        },
    },
})
if err != nil {
    panic(err)
}

_, err = client.AppsV1().Deployments(namespace).Patch(context.TODO(), deploymentName, types.StrategicMergePatchType, patchBytes, metav1.PatchOptions{}, "metadata.resourceVersion=1234")
if err != nil {
    panic(err)
}

13. Finalizers and Garbage Collection

Kubernetes uses finalizers to implement pre-delete hooks. They're part of the object's metadata and prevent the object from being deleted until specific conditions are met.

Example of adding a finalizer:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: example-deployment
  finalizers:
  - kubernetes.io/pv-protection
spec:
  # ... rest of the deployment spec

Key Concepts Explained:

Serialization: Converting data structures or objects into a format that can be stored or transmitted.
Deserialization: Converting serialized data back into its original data structure or object.
JSON (JavaScript Object Notation): A lightweight data interchange format that's easy for humans to read and write.
Protocol Buffers: A method of serializing structured data that's more efficient than JSON for machine-to-machine communication.
gRPC: A high-performance, open-source universal RPC (Remote Procedure Call) framework.
etcd: A distributed key-value store used as Kubernetes' backing store for all cluster data.
BoltDB: An embedded key/value database for Go, used by etcd for its storage engine.
Go: The programming language used to write most of Kubernetes' components.
Lister-Watcher: A pattern in Kubernetes for efficiently observing and reacting to changes in resources.
Container Runtime Interface (CRI): A plugin interface which enables kubelet to use a variety of container runtimes.
Container Network Interface (CNI): A specification and libraries for configuring network interfaces in Linux containers.
RBAC (Role-Based Access Control): A method of regulating access to computer or network resources based on the roles of individual users.
Admission Controllers: Plugins that intercept requests to the Kubernetes API server prior to the persistence of the object.
Raft Consensus Algorithm: A protocol for implementing distributed consensus, used by etcd for consistency.
Reconciliation Loop: A control loop in Kubernetes controllers that continuously works to make the current state match the desired state.
Scheduler Extender: A way to add new scheduling rules to Kubernetes without modifying the scheduler code.
Optimistic Concurrency Control: A method used in Kubernetes to handle concurrent resource modifications.
Finalizers: A way to ensure that certain cleanup operations are performed before a resource is deleted.

This comprehensive explanation covers the entire process of deploying a pod in Kubernetes, from the initial YAML file to the final container creation and beyond. It includes details on data formats, communication protocols, storage mechanisms, and the roles of various Kubernetes components, as well as advanced concepts like admission controllers, RBAC, scheduler extenders, and finalizers. The inclusion of code snippets and configuration examples provides a practical context for these concepts.

This level of detail would be valuable for Kubernetes administrators, developers working on Kubernetes itself, or anyone looking to gain a deep technical understanding of the system. It demonstrates how Kubernetes handles challenges in distributed systems, concurrency, networking, and resource management, showcasing the depth and complexity of its architecture.

Acknowledgments

I’d like to extend my gratitude to ClaudeAI for providing valuable insights and detailed explanations that enriched this article. The assistance was instrumental in crafting a comprehensive overview of Kubernetes processes.

Comprehensive and Advanced Kubernetes Pod Deployment Process

Table of contents