1. Introduction

Securing containerized workloads in today's cloud-centric landscape is paramount to ensuring robust and reliable application deployment. Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) provides a scalable platform for running containerized applications, but it demands a comprehensive security strategy to protect against evolving threats. This document outlines a solution for securing AWS EKS workloads by integrating native AWS security services with open-source tools. The approach leverages Azure DevOps for CI/CD pipelines, Terraform for Infrastructure as Code (IaC), and a combination of AWS tools (Security Hub, GuardDuty, Inspector, Config) and open-source solutions (Checkov, Trivy, gVisor, Falco, FalcoSidekick) to address vulnerabilities, misconfigurations, and runtime threats. This project aims to provide a scalable, secure, and efficient framework for deploying and monitoring containerized applications on AWS, while identifying limitations and areas for improvement.

2. Architecture design

2.1. Technology choice

2.1.1. Azure DevOps for CI/CD

Why Chosen:

Team already uses Azure DevOps, so no learning curve.
AWS CodeCommit is getting deprecated, making Azure DevOps a solid replacement.
AWS Toolkit for Azure DevOps connects well with AWS for EKS deployments

Trade-offs:

Need to expose Kubernetes API server to the internet for Azure DevOps to deploy to AWS, which is a security risk.
Setting up a VPN for secure access is safer, but it adds complex configuration.

2.1.2. Terraform for Infrastructure as Code

Why Chosen:

Works across clouds, unlike CloudFormation or CDK, giving flexibility if we move beyond AWS.
Huge community and module library speed up setup compared to CloudFormation’s custom templates.
Easier to version control and review than CDK’s programmatic code.

Trade-offs:

State files need secure storage (e.g., S3), which is trickier than CloudFormation’s built-in state management.
HCL lacks the logic of real programming languages like CDK’s Python, making complex configs harder.
AWS features hit CloudFormation/CDK first, so Terraform might lag for new services.

2.1.3. Native AWS Security Tools

What I choose:

Security Hub: Centralizes alerts from many sources for easy tracking.
GuardDuty: Detects threats like weird API calls with little setup.
Inspector: Scans EKS and EC2 for vulnerabilities.
AWS Config: Checks resource configs for compliance.

Why chosen:

Seamlessly integrate.
Provide runtime monitoring and threat detection.

Trade-offs:

Costs can spike, especially for AWS Config, if misconfigured.
Limited to AWS scope, missing deep container or app-level issues.

2.1.4. Open-Source Tools

What I choose:

Checkov: Scans Terraform code, Azure Pipeline script, Dockerfile,… for misconfiguration (e.g., open ports, weak IAM policies) before deployment. See more
Trivy: Scans container images for vulnerabilities, ensuring EKS workloads are secure before the AWS Inspector comes to play. See more
gVisor: Isolate Linux hosts from containers, fortify hosts and containers against escapes and privilege escalation. See more
Falco: Detect malicious behavior in hosts and containers based on rules set. See more
FalcoSidekick: Smoothly integrate Falco's real-time alert with third parties, especially AWS Lambda. See more

Why chosen:

Pipeline fit: Checkov scans Terraform for misconfigs and Trivy checks container images, catching issues early in Azure DevOps.
Extra security: gVisor sandboxes containers against kernel exploits; Falco detects runtime threats in real-time.
Fix AWS gaps: AWS tools (Security Hub, GuardDuty, Inspector, Config) miss container and runtime details. Checkov secures IaC, Trivy ensures clean images, gVisor blocks low-level attacks, and Falco spots pod anomalies.

Trade-offs:

Although they have a large community, not all of its members are well-documented.
Checkov, Trivvy slows the pipeline, but it is acceptable. See Alternative open-source security tools for pipeline protection
gVisor: Hurts performance, complex setup.
Falco: Complex setup if using FalcoSidekick + AWS Lambda output.

2.2. CI/CD

2.2.1. Infrastructure pipeline

Overview of the infrastructure pipeline

Infrastructure pipeline - flow chart

💡

The AzureDevOps service connection requires an AWS IAM role to be assumed.

IAM role preparation steps:

Create a LabCICDFullAccess policy that allows full permission to all resources needed, then attach it to LabCICDInfraRole.
Create a trail in CloudTrail to record LabCICDInfraRole activities.
After creating and destroying full infrastructure, generate a policy based on CloudTrail events.
Review and attach the created policy to LabCICDInfraRole, detach LabCICDFullAccess.

2.2.2. Application pipeline

Overview of application pipeline

The Application pipeline uses a different role to be assumed (LabCICDApplicationRole). Because this role just needs to access ECR and EKS, we simply attach a LabApplicationPublishingPolicy as below

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["ecr:GetAuthorizationToken"],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ecr:BatchCheckLayerAvailability",
        "ecr:GetDownloadUrlForLayer",
        "ecr:DescribeRepositories",
        "ecr:ListImages",
        "ecr:DescribeImages",
        "ecr:BatchGetImage",
        "ecr:InitiateLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:CompleteLayerUpload",
        "ecr:PutImage"
      ],
      "Resource": "arn:aws:ecr:ap-southeast-1:917566871600:repository/lab/hello-app"
    },
    {
      "Effect": "Allow",
      "Action": ["eks:DescribeCluster", "eks:AccessKubernetesApi"],
      "Resource": "arn:aws:eks:ap-southeast-1:917566871600:cluster/lab-eks"
    }
  ]
}

Application pipeline - flow chart

2.3. Workloads

Overview of AWS infrastructure

Why do we place worker nodes inside a private subnet and then use a NAT gateway to provide egress traffic to the internet?

Because the Amazon ECR and the other Docker registries are not inside the VPC, and nodes need to pull images (e.g, application, Falco,...). But putting the node to be public on the internet increases the attack surface, which is not best practice.

Why do we place the Amazon EKS on both the public subnet and the private subnet?

Flexibility. The Kubernetes control plane can communicate with worker nodes (kubelet) inside the private subnet and expose the Kubernetes API server to the public if needed. In this solution, the Kubernetes API server needs to be reachable from Azure DevOps and even CloudShell.

But exposing the Kubernetes API server to the internet is not best practice !?

I know. But setting up a VPN solution for the pipeline and development shell introduces more complexity. Now let it be simple.

What is the purpose of the EC2 Instance Connect Endpoint?

I place worker nodes (EC2) inside the private subnet and just allow traffic from the worker nodes’ security group to port 22. When developing, we need SSH to the nodes from outside. That's why EC2 Instance Connect Endpoint comes into play.

VPC configuration

Application Load Balancer forwards traffic to the target group

Security Group configuration

The rule allows ingress traffic from everywhere to TCP:443 of the control plane’s security group is auto attached when creating the EKS cluster. It depends on the EKS configuration:

resource "aws_eks_cluster" "lab" {
    vpc_config {
    security_group_ids     = [aws_security_group.eks_control_plane_sgr.id]
    endpoint_public_access = true
    public_access_cidrs    = "0.0.0.0/0"
  }
}

2.4. Threat detection and monitoring

SecurityHub centralizes alerts from many sources for easy tracking

2.5. Alert and report

💡

At present, I utilize only email as a notification channel. However, we can expand to additional channels by leveraging the capabilities of SNS.

3. Implementation

💡

We need to create the LabCICDInfraRole and LabCICDApplicationRole, and set up an Identity Provider (OIDC) in AWS IAM. Please follow the AWS Identity Provider setup guide

Pull the source code from this repository TungNT106 - Repos

The repository contains 3 main branches:

application: the demo application source code and deployment script.
infra: the infrastructure as code.
destroy-infra: destroy AWS infrastructure after use.

Let’s check out the infra branch and deploy the infrastructure.

When the pipeline is running, we will get confirmation emails from AWS SNS like this. Just confirm subscription to get notifications about new findings and a daily report.

We can view the security scan report directly in Azure DevOps.

Click on a finding to view details.

Then check out the application branch and deploy the demo application.

The security scan report is the same structure as the infrastructure pipeline, except we could have a “Trivy Security Scans” section.

💡

We do not see the “Trivy Security Scans” section because Trivy found 0 CVEs.

Our demo application is working.

~ $ curl http://lab-eks-alb-1409101756.ap-southeast-1.elb.amazonaws.com/health
Server is healthy

Now let’s verify that threat detection and monitoring services work. We focus on open-source security tools.

Open CloudShell. Update kubectl configuration by command

~ $ aws eks update-kubeconfig --name lab-eks --region ap-southeast-1
Updated context arn:aws:eks:ap-southeast-1:917566871600:cluster/lab-eks in /home/cloudshell-user/.kube/config

Ensure all pods are ready

~ $ kubectl get pods --all-namespaces
NAMESPACE          NAME                                          READY   STATUS    RESTARTS   AGE
amazon-guardduty   aws-guardduty-agent-fkzhp                     1/1     Running   0          94m
amazon-guardduty   aws-guardduty-agent-nxt57                     1/1     Running   0          94m
default            hello-app-7677c6d695-xd82l                    1/1     Running   0          19m
default            hello-app-sandboxed-5d5fd9f9df-rwtg9          1/1     Running   0          19m
falco-gvisor       falco-gvisor-falcosidekick-66d74d76dd-pthdk   1/1     Running   0          48m
falco-gvisor       falco-gvisor-falcosidekick-66d74d76dd-w8zrz   1/1     Running   0          48m
falco-gvisor       falco-gvisor-j66z4                            2/2     Running   0          48m
falco-gvisor       falco-gvisor-tctnc                            2/2     Running   0          48m
falco              falco-falcosidekick-79778db95c-9m2kt          1/1     Running   0          48m
falco              falco-falcosidekick-79778db95c-qj6bm          1/1     Running   0          48m
falco              falco-hsmzv                                   2/2     Running   0          48m
falco              falco-zccgc                                   2/2     Running   0          48m
kube-system        aws-node-tjgxr                                2/2     Running   0          94m
kube-system        aws-node-x8nqh                                2/2     Running   0          94m
kube-system        coredns-68bb4d6745-wcc2g                      1/1     Running   0          97m
kube-system        coredns-68bb4d6745-ws8bn                      1/1     Running   0          97m
kube-system        eks-node-monitoring-agent-9nd8b               1/1     Running   0          93m
kube-system        eks-node-monitoring-agent-qlmz8               1/1     Running   0          93m
kube-system        eks-pod-identity-agent-2z2fs                  1/1     Running   0          93m
kube-system        eks-pod-identity-agent-djhkd                  1/1     Running   0          93m
kube-system        kube-proxy-fsvx5                              1/1     Running   0          94m
kube-system        kube-proxy-npnfh                              1/1     Running   0          94m
kube-system        metrics-server-849ccd88cb-9gktf               1/1     Running   0          93m
kube-system        metrics-server-849ccd88cb-jw6dp               1/1     Running   0          93m

Let me explain

The amazon-guardduty add-on is installed by AWS GuardDuty because we let it self-manage.

  resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" {
    additional_configuration {
      name   = "EKS_ADDON_MANAGEMENT"
      status = "ENABLED"
    }
  }

I deployed two different instances of the demo app: hello-app-7677c6d695-xd82l run on runc, and hello-app-sandboxed-5d5fd9f9df-rwtg9 run on runsc (gVisor).
The gVisor runtime is installed on each node via a cloud-init script, and Falco is deployed as a daemonset by Helm.
The Falco daemonset in the namespace falco is the default version that monitors all containers. On the other hand, falco-gvisor runs independently and monitors sandboxed containers only. We actually need falco-gvisor version only, but I demonstrate them all.

Take a look at a normal Falco pod

~ $ kubectl logs -n falco falco-zccgc
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-driver-loader (init), falcoctl-artifact-install (init)
Mon Apr 14 08:34:34 2025: Falco version: 0.40.0 (x86_64)
Mon Apr 14 08:34:34 2025: Falco initialized with configuration files:
Mon Apr 14 08:34:34 2025:    /etc/falco/falco.yaml | schema validation: ok
Mon Apr 14 08:34:34 2025: System info: Linux version 6.1.131-143.221.amzn2023.x86_64 (mockbuild@ip-10-0-47-218) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.2) #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025
Mon Apr 14 08:34:34 2025: Loading rules from:
Mon Apr 14 08:34:34 2025:    /etc/falco/falco_rules.yaml | schema validation: ok
Mon Apr 14 08:34:34 2025: Hostname value has been overridden via environment variable to: ip-10-0-135-59.ap-southeast-1.compute.internal
Mon Apr 14 08:34:34 2025: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Mon Apr 14 08:34:34 2025: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Mon Apr 14 08:34:34 2025: Loaded event sources: syscall
Mon Apr 14 08:34:34 2025: Enabled event sources: syscall
Mon Apr 14 08:34:34 2025: Opening 'syscall' source with BPF probe. BPF probe path: /root/.falco/falco-bpf.o

What we got

Falco initialized successfully with the configuration file /etc/falco/falco.yaml
Loaded rules successfully /etc/falco/falco_rules.yaml
Using the eBPF kernel driver to monitor syscall source.

Check falco-driver-loader container

~ $ kubectl logs -n falco falco-zccgc -c falco-driver-loader
* Setting up /usr/src links from host
2025-04-14 08:34:27 INFO  Running falcoctl driver config
                      ├ name: falco
                      ├ version: 8.0.0+driver
                      ├ type: ebpf
                      ├ host-root: /host
                      └ repos: https://download.falco.org/driver
2025-04-14 08:34:27 INFO  Storing falcoctl driver config 
2025-04-14 08:34:27 INFO  Running falcoctl driver install
                      ├ driver version: 8.0.0+driver
                      ├ driver type: ebpf
                      ├ driver name: falco
                      ├ compile: true
                      ├ download: true
                      ├ target: amazonlinux2023
                      ├ arch: x86_64
                      ├ kernel release: 6.1.131-143.221.amzn2023.x86_64
                      └ kernel version: #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025
2025-04-14 08:34:27 INFO  Removing eBPF probe symlink
                      └ path: /root/.falco/falco-bpf.o
2025-04-14 08:34:27 INFO  Trying to download a driver.
                      └ url: https://download.falco.org/driver/8.0.0%2Bdriver/x86_64/falco_amazonlinux2023_6.1.131-143.221.amzn2023.x86_64_1.o
2025-04-14 08:34:27 INFO  Driver downloaded.
                      └ path: /root/.falco/8.0.0+driver/x86_64/falco_amazonlinux2023_6.1.131-143.221.amzn2023.x86_64_1.o
2025-04-14 08:34:27 INFO  Symlinking eBPF probe
                      ├ src: /root/.falco/8.0.0+driver/x86_64/falco_amazonlinux2023_6.1.131-143.221.amzn2023.x86_64_1.o
                      └ dest: /root/.falco/falco-bpf.o
2025-04-14 08:34:27 INFO  eBPF probe symlinked

Now we can ensure Falco is using the driver correctly. Check falcosidekick status

~ $ kubectl logs -n falco falco-falcosidekick-79778db95c-qj6bm
2025/04/14 08:34:37 [INFO]  : Falcosidekick version: 2.31.1
2025/04/14 08:34:37 [INFO]  : Enabled Outputs: [AWSLambda]
2025/04/14 08:34:37 [INFO]  : Falcosidekick is up and listening on :2801

The logs implicitly confirm that FalcoSidekick successfully assumed a specific role and integrated with AWS Lambda. From now the FalcoSidekick will hook Falco’s findings and send them to the corresponding AWS Lambda function.

💡

If FalcoSidekick is enabled, Falco will not send findings alerts to the stdout anymore.

Let’s see how the Falco default version protects a normal container. We spawn a shell inside the application container.

~ $ kubectl exec -it hello-app-7677c6d695-xd82l -- /bin/sh
~ # whoami
root

We can see the FalcoSidekick is working

~ $ kubectl logs -n falco falco-falcosidekick-79778db95c-qj6bm
2025/04/14 08:34:37 [INFO]  : Falcosidekick version: 2.31.1
2025/04/14 08:34:37 [INFO]  : Enabled Outputs: [AWSLambda]
2025/04/14 08:34:37 [INFO]  : Falcosidekick is up and listening on :2801
2025/04/14 09:20:04 [INFO]  : AWS Lambda - Invoke OK (200)

The SecurityHub receives a new finding from Falco, besides other findings from AWS native tools like AWS Inspector.

💡

Note that the Product category of the Falco alert, Default, is not a default setting or random name. Please check the AWS Security Hub Documentation and the example source code in falco_handler lambda. Any misconfiguration may break the integration between Falco and SecurityHub.

The new finding is sent to the email channel

The Falco default version works. Now, check the Falco-gVisor version.

Firstly, we need to back off a bit to understand how the Falco and Falco-gVisor integration work under the hood.

Falco is an engine that evaluates system call data and raises alerts if any data matches the rules. Where does the data come from? From the Falco kernel driver. See more

The situation changes when gVisor comes to play. gVisor isolates the container from the host Linux Kernel and limits the system calls available. So, if Falco uses the standard kernel driver, the data collected is not helpful. Falco will spam alert like this

On the other hand, gVisor has a component called Sentry that works as a kernel driver. Now, instead of using the built-in driver, Falco uses gVisor as a driver. Falco will open a Unix Domain Socket, then collect data from it. On the gVisor side, Sentry connects to this socket and sends data to Falco. This connection is called Sink.

We need to ensure the gVisor runtime is configured and used successfully. Here is Falco's configuration:

driver:
  enabled: true
  kind: gvisor
  gvisor:
    runsc:
      path: /usr/local/bin
      root: /run/containerd/runsc
      config: /etc/containerd/config.toml

Let’s connect to the node via EC2 Instance Connect Endpoint to check the real configuration.

[root@ip-10-0-148-41 ~]# ls /etc/containerd/
base-runtime-spec.json  config-backup.toml  config.toml

The base-runtime-spec.json and config-backup.toml are default configuration files. Check if config.toml is modified correctly

[root@ip-10-0-148-41 ~]# cat /etc/containerd/config.toml 
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"

[grpc]
address = "/run/containerd/containerd.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true

[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "localhost/kubernetes/pause"
enable_cdi = true

[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
base_runtime_spec = "/etc/containerd/base-runtime-spec.json"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = "/usr/sbin/runc"
SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
  runtime_type = "io.containerd.runsc.v1"
  pod-init-config = "/run/containerd/runsc/pod-init.json"

Look at the content of /run/containerd/runsc/pod-init.json. The Sink connection between Falco and gVisor is /run/containerd/runsc/falco.sock.

{
  "sinks": [
    {
      "config": {
        "endpoint": "/run/containerd/runsc/falco.sock",
        "retries": 3
      },
      "ignore_setup_error": true,
      "name": "remote"
    }
  ]
}

⚠

All setting entries here are crucial. Do not modify if you do not understand what you are doing.

⚠

Follow the documentation at the gVisor documentation WILL NOT WORK. Please do not fall into this rabbit hole.

💡

We only add runtime_type to the config.toml. The pod-init-config is auto-generated and came from another source.

Check runsc root path exists or not. We found some .state and .sock files, that means there are some containers using runsc.

[root@ip-10-0-148-41 ~]# ls /run/containerd/runsc/k8s.io/
a07f4d498111ebe586c50f1c274c935111489b19715d747b38fde31e10f329f1_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.lock
a07f4d498111ebe586c50f1c274c935111489b19715d747b38fde31e10f329f1_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.state
e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.lock
e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.state
runsc-e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.sock

We can see 2 containers use the same image 917566871600.dkr.ecr.ap-southeast-1.amazonaws.com/lab/hello-app:hello-app-build-1623 but run on different runtimes. Take a look at the container 803c85bca4… and a07f4d4981….

[root@ip-10-0-148-41 ~]# sudo ctr -n k8s.io container ls
CONTAINER                                                           IMAGE                                                                                                  RUNTIME                   
00c695a4210d8bf87791b4346e956e59ab33a6724711797033494c449adb8b49    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon/aws-network-policy-agent:v1.1.6-eksbuild.1    io.containerd.runc.v2     
071a446aada7a7799ff8ea90c85fc2e8690239130cdc9c9d227fb365b3e3a99d    docker.io/falcosecurity/falcosidekick:2.31.1                                                           io.containerd.runc.v2     
2c8a2152df26778b8dc6f7fc3f6d5720ccc3fd58e40f55b596e61fe3961783cf    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
32116969400baa71075b12ad6e89c0166d269f6457d84f65f570a39c71e5c3e2    docker.io/falcosecurity/falco:0.40.0-debian                                                            io.containerd.runc.v2     
445e8305cd56cc8e64a399cbeeefb3f8135c947a4ad21315ea8626e30dba8287    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/metrics-server:v0.7.2-eksbuild.2                 io.containerd.runc.v2     
450df4f61bfff5a499f279d6075968321f6527cb8c6f4aff29e6f42896b38d0b    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/aws-guardduty-agent:v1.9.0                           io.containerd.runc.v2     
4d05d1828f0aefc57e1f0796d5748e129c5515464bbeefce08c3f6e11ca53ad1    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
4e4e83c9729d16826d66625d8cc11a8f60757b5d6e0bffb34ff9202f93aebab4    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
504e87f64df0b3501929d94cb98871fbb1891a8f7182e77b4cf5e55e14555809    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon-k8s-cni-init:v1.19.2-eksbuild.1               io.containerd.runc.v2     
56fe9b55c88654cbf51d4d046190d3a82cece66579b5250e51608e7fe383b3e4    docker.io/falcosecurity/falcoctl:0.11.0                                                                io.containerd.runc.v2     
5786a49427b1e934d684e3ba7d1c3c5bd2c60bd90a827665a5f5f87c5ba56833    docker.io/falcosecurity/falcoctl:0.11.0                                                                io.containerd.runc.v2     
618cabdd25dbcd835fa1866fa03c4c3472257075d259556062452f33faf66ad0    docker.io/falcosecurity/falco-driver-loader:0.40.0                                                     io.containerd.runc.v2     
78c58ce2bf4961f73ad7590098d4bb69d12701a7406a1d4e7c02c52c00c0583f    docker.io/falcosecurity/falcosidekick:2.31.1                                                           io.containerd.runc.v2     
803c85bca405bd9ccad7343b295e84b5a331358a722e69bea61bce308e2025d7    917566871600.dkr.ecr.ap-southeast-1.amazonaws.com/lab/hello-app:hello-app-build-1623                   io.containerd.runc.v2     
80473a9eeeeba90e51e1191c93244a8a20852e18e984f0ab308eac6e267d82ab    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/eks-node-monitoring-agent:v1.2.0-eksbuild.1      io.containerd.runc.v2     
82d26f7053e0046b20f98ba241f6fdfdfd85a176e4f601449a1e9997a303f27a    docker.io/falcosecurity/falcoctl:0.11.0                                                                io.containerd.runc.v2     
8db04f6715c0ab70f33a11d3cfda33f06ca75a28fe8c078d6ba3134b2cfc4df6    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/kube-proxy:v1.32.0-minimal-eksbuild.2            io.containerd.runc.v2     
9e74b7c3072373e5028cf4bca8849b1fc0b88757bcd97f2804136a4321864aac    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
a07f4d498111ebe586c50f1c274c935111489b19715d747b38fde31e10f329f1    917566871600.dkr.ecr.ap-southeast-1.amazonaws.com/lab/hello-app:hello-app-build-1623                   io.containerd.runsc.v1    
a5a00c183537b3330798353975b400704d3fffd2dc321cb6fd211247391a37f2    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon-k8s-cni:v1.19.2-eksbuild.1                    io.containerd.runc.v2     
a73e226ca0822bdd5abe33b7aeb721581958f54054ebc68d104d4c4bd13063db    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
b23193b6d3c322d0fd24342e9b9ffe8928c51d16ba82342d4251dd175d5b4468    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/eks-pod-identity-agent:v0.1.21                   io.containerd.runc.v2     
b530e0f18f48322545f591b801d638ea1a6d818441d61bccd72cca62e01d7d46    docker.io/falcosecurity/falcoctl:0.11.0                                                                io.containerd.runc.v2     
c489b92bf8a51a74b2fb3122bcd6b109b958d61952a49c304cf68e298cee62ac    602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/eks-pod-identity-agent:v0.1.21                   io.containerd.runc.v2     
c59cea7fd8637c428683b298fd88a33ed3be533f1e24c74866f23d9c60864a54    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
d0fcda46418f3f23043735b321833976d9855eda9d71515e8905c8cdd3ecc4c0    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
d1a3839757a918db69f536fe9131feac860584854c22e6d62024fe46985eaefd    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
d5d9bc952f53f95c468d3c5b608ecc05917581000585719aac98978a3b206e66    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
dd2b7d322686a8e8c67b23a47fb3f18db703d69268fc4d9c46eba58b9f389259    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
dd54d707ccb39a0d76eee6cb5802767c292428850bd7f51cf3259503107a303a    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runc.v2     
e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35    602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10                                            io.containerd.runsc.v1    
ef10441d987d81d66ad597c5cb93ee91530a63a7a4ac109ec266518044cb9f66    docker.io/falcosecurity/falco:0.40.0-debian                                                            io.containerd.runc.v2     
f0f8235215e6aba2adb84490fb3bbc4dcd22953a2dba35380fc5d17e20290f60    docker.io/falcosecurity/falco:0.40.0-debian                                                            io.containerd.runc.v2

💡

Except that the application container is sandboxed, other containers run on io.containerd.runc.v2. It is expected behavior.

We ensure that now gVisor runtime is correctly configured and used.

Come back to the CloudShell to check the Falco-gVisor pod

~ $ kubectl logs -n falco-gvisor falco-gvisor-j66z4
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-gvisor-init (init), falcoctl-artifact-install (init)
Mon Apr 14 08:34:35 2025: Falco version: 0.40.0 (x86_64)
Mon Apr 14 08:34:35 2025: CLI args: /usr/bin/falco -pk
Mon Apr 14 08:34:35 2025: Falco initialized with configuration files:
Mon Apr 14 08:34:35 2025:    /etc/falco/falco.yaml | schema validation: ok
Mon Apr 14 08:34:35 2025: [libs]: Cannot read host init process proc root: 13
Mon Apr 14 08:34:35 2025: [libs]: Cannot read host init process proc root: 13
Mon Apr 14 08:34:35 2025: Enabled container engine 'docker'
Mon Apr 14 08:34:35 2025: Enabled container engine 'CRI'
Mon Apr 14 08:34:35 2025: Enabled container runtime socket at '/run/containerd/containerd.sock' via config file
Mon Apr 14 08:34:35 2025: Enabled container runtime socket at '/run/crio/crio.sock' via config file
Mon Apr 14 08:34:35 2025: Configured rules filenames:
Mon Apr 14 08:34:35 2025:    /etc/falco/falco_rules.yaml
Mon Apr 14 08:34:35 2025: Loading rules from:
Mon Apr 14 08:34:35 2025:    /etc/falco/falco_rules.yaml | schema validation: ok
Mon Apr 14 08:34:35 2025: Hostname value has been overridden via environment variable to: ip-10-0-135-59.ap-southeast-1.compute.internal
Mon Apr 14 08:34:35 2025: Watching file '/etc/falco/falco.yaml'
Mon Apr 14 08:34:35 2025: Watching file '/etc/falco/falco_rules.yaml'
Mon Apr 14 08:34:35 2025: (19) syscalls in rules: connect, dup, dup2, dup3, execve, execveat, finit_module, init_module, link, linkat, open, openat, openat2, ptrace, sendmsg, sendto, socket, symlink, symlinkat
Mon Apr 14 08:34:35 2025: +(53) syscalls (Falco's state engine set of syscalls): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, creat, epoll_create, epoll_create1, eventfd, eventfd2, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, memfd_create, mount, open_by_handle_at, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmmsg, recvmsg, sendmmsg, setgid, setpgid, setregid, setresgid, setresuid, setreuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socketpair, timerfd_create, umount, umount2, userfaultfd, vfork
Mon Apr 14 08:34:35 2025: (72) syscalls selected in total (final set): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execve, execveat, fchdir, fcntl, finit_module, fork, getsockopt, init_module, inotify_init, inotify_init1, io_uring_setup, link, linkat, memfd_create, mount, open, open_by_handle_at, openat, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, ptrace, recvfrom, recvmmsg, recvmsg, sendmmsg, sendmsg, sendto, setgid, setpgid, setregid, setresgid, setresuid, setreuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, symlink, symlinkat, timerfd_create, umount, umount2, userfaultfd, vfork
Mon Apr 14 08:34:35 2025: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Mon Apr 14 08:34:35 2025: Enabled rules:
Mon Apr 14 08:34:35 2025:    Directory traversal monitored file read
Mon Apr 14 08:34:35 2025:    Read sensitive file trusted after startup
Mon Apr 14 08:34:35 2025:    Read sensitive file untrusted
Mon Apr 14 08:34:35 2025:    Run shell untrusted
Mon Apr 14 08:34:35 2025:    System user interactive
Mon Apr 14 08:34:35 2025:    Terminal shell in container
Mon Apr 14 08:34:35 2025:    Contact K8S API Server From Container
Mon Apr 14 08:34:35 2025:    Netcat Remote Code Execution in Container
Mon Apr 14 08:34:35 2025:    Search Private Keys or Passwords
Mon Apr 14 08:34:35 2025:    Clear Log Activities
Mon Apr 14 08:34:35 2025:    Remove Bulk Data from Disk
Mon Apr 14 08:34:35 2025:    Create Symlink Over Sensitive Files
Mon Apr 14 08:34:35 2025:    Create Hardlink Over Sensitive Files
Mon Apr 14 08:34:35 2025:    Packet socket created in container
Mon Apr 14 08:34:35 2025:    Redirect STDOUT/STDIN to Network Connection in Container
Mon Apr 14 08:34:35 2025:    Linux Kernel Module Injection Detected
Mon Apr 14 08:34:35 2025:    Debugfs Launched in Privileged Container
Mon Apr 14 08:34:35 2025:    Detect release_agent File Container Escapes
Mon Apr 14 08:34:35 2025:    PTRACE attached to process
Mon Apr 14 08:34:35 2025:    PTRACE anti-debug attempt
Mon Apr 14 08:34:35 2025:    Find AWS Credentials
Mon Apr 14 08:34:35 2025:    Execution from /dev/shm
Mon Apr 14 08:34:35 2025:    Drop and execute new binary in container
Mon Apr 14 08:34:35 2025:    Disallowed SSH Connection Non Standard Port
Mon Apr 14 08:34:35 2025:    Fileless execution via memfd_create
Mon Apr 14 08:34:35 2025: (25) enabled rules in total
Mon Apr 14 08:34:35 2025: Loaded event sources: syscall
Mon Apr 14 08:34:35 2025: Enabled event sources: syscall
Mon Apr 14 08:34:35 2025: Opening event source 'syscall'
Mon Apr 14 08:34:35 2025: Opening 'syscall' source with gVisor. Configuration path: /gvisor-config/pod-init.json
Mon Apr 14 08:34:35 2025: [libs]: Trying to open the right engine!

The logs look verbose because I enabled debug mode for Falco’s logging. We have some useful information:

Falco is configured and loaded with rules successfully.
In the enabled rule list, we have Terminal shell in container rule.
Falco is using gVisor (Sentry) as a kernel driver instead of eBPF.

So Falco seems ready. Ensure FalcoSidekick is ready too

~ $ kubectl logs -n falco-gvisor falco-gvisor-falcosidekick-66d74d76dd-pthdk
2025/04/14 08:34:54 [INFO]  : Falcosidekick version: 2.31.1
2025/04/14 08:34:54 [INFO]  : Enabled Outputs: [AWSLambda]
2025/04/14 08:34:54 [INFO]  : Falcosidekick is up and listening on :2801

One thing we can notice: (*) the configuration path value in the logs (Configuration path: /gvisor-config/pod-init.json) is different from the value in the config.toml (pod-init-config = "/run/containerd/runsc/pod-init.json"). To understand why we need to check falco-gvisor-init container logs

~ $ kubectl logs -n falco-gvisor falco-gvisor-j66z4 -c falco-gvisor-init
* Configuring Falco+gVisor integration....
* Checking for /host/etc/containerd/config.toml file...
* Generating the Falco configuration...
2025-04-14T08:34:27+0000: Falco version: 0.40.0 (x86_64)
2025-04-14T08:34:27+0000: Falco initialized with configuration files:
2025-04-14T08:34:27+0000:    /etc/falco/falco.yaml | schema validation: ok
2025-04-14T08:34:27+0000: System info: Linux version 6.1.131-143.221.amzn2023.x86_64 (mockbuild@ip-10-0-47-218) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.2) #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025
* Setting the updated Falco configuration to /gvisor-config/pod-init.json...
* Falco+gVisor correctly configured.

We found useful information: Falco+gVisor is correctly configured with the config path /gvisor-config/pod-init.json. Falco confirmed that the Falco-gVisor integration is perfect. But where do the logs come from? When digging into the Falco source code on GitHub, I found an interesting file helpers.tpl. This file is used to bootstrap Falco-gVisor integration when deploying Falco. The file’s content is very verbose, we just need to focus on several lines:

/usr/bin/falco --gvisor-generate-config=${root}/falco.sock > /host${root}/pod-init.json

sed 's/"endpoint" : "\/run/"endpoint" : "\/host\/run/' /host${root}/pod-init.json > /gvisor-config/pod-init.json

That explains the (*). The integration between Falco and gVisor is perfectly configured. Let’s spawn a shell inside the isolated container.

~ $ kubectl exec -it hello-app-sandboxed-5d5fd9f9df-rwtg9 -- /bin/sh
~ # whoami
root

But NO alert fired.

Currently, the Falco-gVisor integration is working on

Container engine only: Docker or Containerd.
Google Kubernetes Engine (supports out of the box).
Other Kubernetes environments, like MiniKube.

The Falco-gVisor integration does not work on the AWS EKS.

Root cause: gVisor Sentry can not send data to Falco via falco.sock.

Every day at noon UTC +0 (7 PM in Vietnam), we receive a security daily report in our mailbox

4. Limitations and improvements needed

The AWS Config rules in the demo source code are AWS-managed rules only. We need to write more custom rules for the EKS cluster to improve the security.

In the AWS Config rules, we have one

  resource "aws_config_config_rule" "eks_endpoint_no_public_access" {
    name        = "eks-endpoint-no-public-access"
    description = "Ensures EKS cluster endpoints are not publicly accessible to minimize unauthorized access risks."
    source {
      owner             = "AWS"
      source_identifier = "EKS_ENDPOINT_NO_PUBLIC_ACCESS"
    }
    scope {
      compliance_resource_types = ["AWS::EKS::Cluster"]
    }
  }

But for now, the Kubernetes API server endpoint is publicly accessible. We need to set up a VPN solution to resolve it.

The Falco-gVisor integration is not working as of now. I’m preparing to open an issue in the Falco GitHub repository.

Fortifying AWS EKS: A Comprehensive Guide to Securing Containerized Workloads with AWS and Open-Source Tools

Table of contents

1. Introduction

2. Architecture design

2.1. Technology choice

2.1.1. Azure DevOps for CI/CD

2.1.2. Terraform for Infrastructure as Code

2.1.3. Native AWS Security Tools

2.1.4. Open-Source Tools

2.2. CI/CD

2.2.1. Infrastructure pipeline

2.2.2. Application pipeline

2.3. Workloads

2.4. Threat detection and monitoring

2.5. Alert and report

3. Implementation

4. Limitations and improvements needed

Subscribe to my newsletter

Tùng Nguyễn Thanh

Tùng Nguyễn Thanh