Fortifying AWS EKS: A Comprehensive Guide to Securing Containerized Workloads with AWS and Open-Source Tools


1. Introduction
Securing containerized workloads in today's cloud-centric landscape is paramount to ensuring robust and reliable application deployment. Amazon Web Services (AWS) Elastic Kubernetes Service (EKS) provides a scalable platform for running containerized applications, but it demands a comprehensive security strategy to protect against evolving threats. This document outlines a solution for securing AWS EKS workloads by integrating native AWS security services with open-source tools. The approach leverages Azure DevOps for CI/CD pipelines, Terraform for Infrastructure as Code (IaC), and a combination of AWS tools (Security Hub, GuardDuty, Inspector, Config) and open-source solutions (Checkov, Trivy, gVisor, Falco, FalcoSidekick) to address vulnerabilities, misconfigurations, and runtime threats. This project aims to provide a scalable, secure, and efficient framework for deploying and monitoring containerized applications on AWS, while identifying limitations and areas for improvement.
2. Architecture design
2.1. Technology choice
2.1.1. Azure DevOps for CI/CD
Why Chosen:
Team already uses Azure DevOps, so no learning curve.
AWS CodeCommit is getting deprecated, making Azure DevOps a solid replacement.
AWS Toolkit for Azure DevOps connects well with AWS for EKS deployments
Trade-offs:
Need to expose Kubernetes API server to the internet for Azure DevOps to deploy to AWS, which is a security risk.
Setting up a VPN for secure access is safer, but it adds complex configuration.
2.1.2. Terraform for Infrastructure as Code
Why Chosen:
Works across clouds, unlike CloudFormation or CDK, giving flexibility if we move beyond AWS.
Huge community and module library speed up setup compared to CloudFormation’s custom templates.
Easier to version control and review than CDK’s programmatic code.
Trade-offs:
State files need secure storage (e.g., S3), which is trickier than CloudFormation’s built-in state management.
HCL lacks the logic of real programming languages like CDK’s Python, making complex configs harder.
AWS features hit CloudFormation/CDK first, so Terraform might lag for new services.
2.1.3. Native AWS Security Tools
What I choose:
Security Hub: Centralizes alerts from many sources for easy tracking.
GuardDuty: Detects threats like weird API calls with little setup.
Inspector: Scans EKS and EC2 for vulnerabilities.
AWS Config: Checks resource configs for compliance.
Why chosen:
Seamlessly integrate.
Provide runtime monitoring and threat detection.
Trade-offs:
Costs can spike, especially for AWS Config, if misconfigured.
Limited to AWS scope, missing deep container or app-level issues.
2.1.4. Open-Source Tools
What I choose:
Checkov: Scans Terraform code, Azure Pipeline script, Dockerfile,… for misconfiguration (e.g., open ports, weak IAM policies) before deployment. See more
Trivy: Scans container images for vulnerabilities, ensuring EKS workloads are secure before the AWS Inspector comes to play. See more
gVisor: Isolate Linux hosts from containers, fortify hosts and containers against escapes and privilege escalation. See more
Falco: Detect malicious behavior in hosts and containers based on rules set. See more
FalcoSidekick: Smoothly integrate Falco's real-time alert with third parties, especially AWS Lambda. See more
Why chosen:
Pipeline fit: Checkov scans Terraform for misconfigs and Trivy checks container images, catching issues early in Azure DevOps.
Extra security: gVisor sandboxes containers against kernel exploits; Falco detects runtime threats in real-time.
Fix AWS gaps: AWS tools (Security Hub, GuardDuty, Inspector, Config) miss container and runtime details. Checkov secures IaC, Trivy ensures clean images, gVisor blocks low-level attacks, and Falco spots pod anomalies.
Trade-offs:
Although they have a large community, not all of its members are well-documented.
Checkov, Trivvy slows the pipeline, but it is acceptable. See Alternative open-source security tools for pipeline protection
gVisor: Hurts performance, complex setup.
Falco: Complex setup if using FalcoSidekick + AWS Lambda output.
2.2. CI/CD
2.2.1. Infrastructure pipeline
Overview of the infrastructure pipeline
Infrastructure pipeline - flow chart
IAM role preparation steps:
Create a LabCICDFullAccess policy that allows full permission to all resources needed, then attach it to LabCICDInfraRole.
Create a trail in CloudTrail to record LabCICDInfraRole activities.
After creating and destroying full infrastructure, generate a policy based on CloudTrail events.
Review and attach the created policy to LabCICDInfraRole, detach LabCICDFullAccess.
2.2.2. Application pipeline
Overview of application pipeline
The Application pipeline uses a different role to be assumed (LabCICDApplicationRole). Because this role just needs to access ECR and EKS, we simply attach a LabApplicationPublishingPolicy as below
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["ecr:GetAuthorizationToken"],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage"
],
"Resource": "arn:aws:ecr:ap-southeast-1:917566871600:repository/lab/hello-app"
},
{
"Effect": "Allow",
"Action": ["eks:DescribeCluster", "eks:AccessKubernetesApi"],
"Resource": "arn:aws:eks:ap-southeast-1:917566871600:cluster/lab-eks"
}
]
}
Application pipeline - flow chart
2.3. Workloads
Overview of AWS infrastructure
Why do we place worker nodes inside a private subnet and then use a NAT gateway to provide egress traffic to the internet?
Because the Amazon ECR and the other Docker registries are not inside the VPC, and nodes need to pull images (e.g, application, Falco,...). But putting the node to be public on the internet increases the attack surface, which is not best practice.
Why do we place the Amazon EKS on both the public subnet and the private subnet?
Flexibility. The Kubernetes control plane can communicate with worker nodes (kubelet) inside the private subnet and expose the Kubernetes API server to the public if needed. In this solution, the Kubernetes API server needs to be reachable from Azure DevOps and even CloudShell.
But exposing the Kubernetes API server to the internet is not best practice !?
I know. But setting up a VPN solution for the pipeline and development shell introduces more complexity. Now let it be simple.
What is the purpose of the EC2 Instance Connect Endpoint?
I place worker nodes (EC2) inside the private subnet and just allow traffic from the worker nodes’ security group to port 22. When developing, we need SSH to the nodes from outside. That's why EC2 Instance Connect Endpoint comes into play.
VPC configuration
Application Load Balancer forwards traffic to the target group
Security Group configuration
The rule allows ingress traffic from everywhere to TCP:443 of the control plane’s security group is auto attached when creating the EKS cluster. It depends on the EKS configuration:
resource "aws_eks_cluster" "lab" {
vpc_config {
security_group_ids = [aws_security_group.eks_control_plane_sgr.id]
endpoint_public_access = true
public_access_cidrs = "0.0.0.0/0"
}
}
2.4. Threat detection and monitoring
SecurityHub centralizes alerts from many sources for easy tracking
2.5. Alert and report
3. Implementation
Pull the source code from this repository TungNT106 - Repos
The repository contains 3 main branches:
application: the demo application source code and deployment script.
infra: the infrastructure as code.
destroy-infra: destroy AWS infrastructure after use.
Let’s check out the infra
branch and deploy the infrastructure.
When the pipeline is running, we will get confirmation emails from AWS SNS like this. Just confirm subscription to get notifications about new findings and a daily report.
We can view the security scan report directly in Azure DevOps.
Click on a finding to view details.
Then check out the application
branch and deploy the demo application.
The security scan report is the same structure as the infrastructure pipeline, except we could have a “Trivy Security Scans” section.
Our demo application is working.
~ $ curl http://lab-eks-alb-1409101756.ap-southeast-1.elb.amazonaws.com/health
Server is healthy
Now let’s verify that threat detection and monitoring services work. We focus on open-source security tools.
Open CloudShell. Update kubectl
configuration by command
~ $ aws eks update-kubeconfig --name lab-eks --region ap-southeast-1
Updated context arn:aws:eks:ap-southeast-1:917566871600:cluster/lab-eks in /home/cloudshell-user/.kube/config
Ensure all pods are ready
~ $ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
amazon-guardduty aws-guardduty-agent-fkzhp 1/1 Running 0 94m
amazon-guardduty aws-guardduty-agent-nxt57 1/1 Running 0 94m
default hello-app-7677c6d695-xd82l 1/1 Running 0 19m
default hello-app-sandboxed-5d5fd9f9df-rwtg9 1/1 Running 0 19m
falco-gvisor falco-gvisor-falcosidekick-66d74d76dd-pthdk 1/1 Running 0 48m
falco-gvisor falco-gvisor-falcosidekick-66d74d76dd-w8zrz 1/1 Running 0 48m
falco-gvisor falco-gvisor-j66z4 2/2 Running 0 48m
falco-gvisor falco-gvisor-tctnc 2/2 Running 0 48m
falco falco-falcosidekick-79778db95c-9m2kt 1/1 Running 0 48m
falco falco-falcosidekick-79778db95c-qj6bm 1/1 Running 0 48m
falco falco-hsmzv 2/2 Running 0 48m
falco falco-zccgc 2/2 Running 0 48m
kube-system aws-node-tjgxr 2/2 Running 0 94m
kube-system aws-node-x8nqh 2/2 Running 0 94m
kube-system coredns-68bb4d6745-wcc2g 1/1 Running 0 97m
kube-system coredns-68bb4d6745-ws8bn 1/1 Running 0 97m
kube-system eks-node-monitoring-agent-9nd8b 1/1 Running 0 93m
kube-system eks-node-monitoring-agent-qlmz8 1/1 Running 0 93m
kube-system eks-pod-identity-agent-2z2fs 1/1 Running 0 93m
kube-system eks-pod-identity-agent-djhkd 1/1 Running 0 93m
kube-system kube-proxy-fsvx5 1/1 Running 0 94m
kube-system kube-proxy-npnfh 1/1 Running 0 94m
kube-system metrics-server-849ccd88cb-9gktf 1/1 Running 0 93m
kube-system metrics-server-849ccd88cb-jw6dp 1/1 Running 0 93m
Let me explain
The
amazon-guardduty
add-on is installed by AWS GuardDuty because we let it self-manage.resource "aws_guardduty_detector_feature" "eks_runtime_monitoring" { additional_configuration { name = "EKS_ADDON_MANAGEMENT" status = "ENABLED" } }
I deployed two different instances of the demo app:
hello-app-7677c6d695-xd82l
run onrunc
, andhello-app-sandboxed-5d5fd9f9df-rwtg9
run onrunsc
(gVisor).The gVisor runtime is installed on each node via a cloud-init script, and Falco is deployed as a daemonset by Helm.
The Falco daemonset in the namespace
falco
is the default version that monitors all containers. On the other hand,falco-gvisor
runs independently and monitors sandboxed containers only. We actually needfalco-gvisor
version only, but I demonstrate them all.
Take a look at a normal Falco pod
~ $ kubectl logs -n falco falco-zccgc
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-driver-loader (init), falcoctl-artifact-install (init)
Mon Apr 14 08:34:34 2025: Falco version: 0.40.0 (x86_64)
Mon Apr 14 08:34:34 2025: Falco initialized with configuration files:
Mon Apr 14 08:34:34 2025: /etc/falco/falco.yaml | schema validation: ok
Mon Apr 14 08:34:34 2025: System info: Linux version 6.1.131-143.221.amzn2023.x86_64 (mockbuild@ip-10-0-47-218) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.2) #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025
Mon Apr 14 08:34:34 2025: Loading rules from:
Mon Apr 14 08:34:34 2025: /etc/falco/falco_rules.yaml | schema validation: ok
Mon Apr 14 08:34:34 2025: Hostname value has been overridden via environment variable to: ip-10-0-135-59.ap-southeast-1.compute.internal
Mon Apr 14 08:34:34 2025: The chosen syscall buffer dimension is: 8388608 bytes (8 MBs)
Mon Apr 14 08:34:34 2025: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Mon Apr 14 08:34:34 2025: Loaded event sources: syscall
Mon Apr 14 08:34:34 2025: Enabled event sources: syscall
Mon Apr 14 08:34:34 2025: Opening 'syscall' source with BPF probe. BPF probe path: /root/.falco/falco-bpf.o
What we got
Falco initialized successfully with the configuration file
/etc/falco/falco.yaml
Loaded rules successfully
/etc/falco/falco_rules.yaml
Using the eBPF kernel driver to monitor syscall source.
Check falco-driver-loader
container
~ $ kubectl logs -n falco falco-zccgc -c falco-driver-loader
* Setting up /usr/src links from host
2025-04-14 08:34:27 INFO Running falcoctl driver config
├ name: falco
├ version: 8.0.0+driver
├ type: ebpf
├ host-root: /host
└ repos: https://download.falco.org/driver
2025-04-14 08:34:27 INFO Storing falcoctl driver config
2025-04-14 08:34:27 INFO Running falcoctl driver install
├ driver version: 8.0.0+driver
├ driver type: ebpf
├ driver name: falco
├ compile: true
├ download: true
├ target: amazonlinux2023
├ arch: x86_64
├ kernel release: 6.1.131-143.221.amzn2023.x86_64
└ kernel version: #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025
2025-04-14 08:34:27 INFO Removing eBPF probe symlink
└ path: /root/.falco/falco-bpf.o
2025-04-14 08:34:27 INFO Trying to download a driver.
└ url: https://download.falco.org/driver/8.0.0%2Bdriver/x86_64/falco_amazonlinux2023_6.1.131-143.221.amzn2023.x86_64_1.o
2025-04-14 08:34:27 INFO Driver downloaded.
└ path: /root/.falco/8.0.0+driver/x86_64/falco_amazonlinux2023_6.1.131-143.221.amzn2023.x86_64_1.o
2025-04-14 08:34:27 INFO Symlinking eBPF probe
├ src: /root/.falco/8.0.0+driver/x86_64/falco_amazonlinux2023_6.1.131-143.221.amzn2023.x86_64_1.o
└ dest: /root/.falco/falco-bpf.o
2025-04-14 08:34:27 INFO eBPF probe symlinked
Now we can ensure Falco is using the driver correctly. Check falcosidekick
status
~ $ kubectl logs -n falco falco-falcosidekick-79778db95c-qj6bm
2025/04/14 08:34:37 [INFO] : Falcosidekick version: 2.31.1
2025/04/14 08:34:37 [INFO] : Enabled Outputs: [AWSLambda]
2025/04/14 08:34:37 [INFO] : Falcosidekick is up and listening on :2801
The logs implicitly confirm that FalcoSidekick successfully assumed a specific role and integrated with AWS Lambda. From now the FalcoSidekick will hook Falco’s findings and send them to the corresponding AWS Lambda function.
Let’s see how the Falco default version protects a normal container. We spawn a shell inside the application container.
~ $ kubectl exec -it hello-app-7677c6d695-xd82l -- /bin/sh
~ # whoami
root
We can see the FalcoSidekick is working
~ $ kubectl logs -n falco falco-falcosidekick-79778db95c-qj6bm
2025/04/14 08:34:37 [INFO] : Falcosidekick version: 2.31.1
2025/04/14 08:34:37 [INFO] : Enabled Outputs: [AWSLambda]
2025/04/14 08:34:37 [INFO] : Falcosidekick is up and listening on :2801
2025/04/14 09:20:04 [INFO] : AWS Lambda - Invoke OK (200)
The SecurityHub receives a new finding from Falco, besides other findings from AWS native tools like AWS Inspector.
Default
, is not a default setting or random name. Please check the AWS Security Hub Documentation and the example source code in falco_handler
lambda. Any misconfiguration may break the integration between Falco and SecurityHub.The new finding is sent to the email channel
The Falco default version works. Now, check the Falco-gVisor version.
Firstly, we need to back off a bit to understand how the Falco and Falco-gVisor integration work under the hood.
Falco is an engine that evaluates system call data and raises alerts if any data matches the rules. Where does the data come from? From the Falco kernel driver. See more
The situation changes when gVisor comes to play. gVisor isolates the container from the host Linux Kernel and limits the system calls available. So, if Falco uses the standard kernel driver, the data collected is not helpful. Falco will spam alert like this
On the other hand, gVisor has a component called Sentry that works as a kernel driver. Now, instead of using the built-in driver, Falco uses gVisor as a driver. Falco will open a Unix Domain Socket, then collect data from it. On the gVisor side, Sentry connects to this socket and sends data to Falco. This connection is called Sink
.
We need to ensure the gVisor runtime is configured and used successfully. Here is Falco's configuration:
driver:
enabled: true
kind: gvisor
gvisor:
runsc:
path: /usr/local/bin
root: /run/containerd/runsc
config: /etc/containerd/config.toml
Let’s connect to the node via EC2 Instance Connect Endpoint to check the real configuration.
[root@ip-10-0-148-41 ~]# ls /etc/containerd/
base-runtime-spec.json config-backup.toml config.toml
The base-runtime-spec.json
and config-backup.toml
are default configuration files. Check if config.toml
is modified correctly
[root@ip-10-0-148-41 ~]# cat /etc/containerd/config.toml
version = 2
root = "/var/lib/containerd"
state = "/run/containerd"
[grpc]
address = "/run/containerd/containerd.sock"
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
discard_unpacked_layers = true
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "localhost/kubernetes/pause"
enable_cdi = true
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d:/etc/docker/certs.d"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
base_runtime_spec = "/etc/containerd/base-runtime-spec.json"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = "/usr/sbin/runc"
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
runtime_type = "io.containerd.runsc.v1"
pod-init-config = "/run/containerd/runsc/pod-init.json"
Look at the content of /run/containerd/runsc/pod-init.json
. The Sink
connection between Falco and gVisor is /run/containerd/runsc/falco.sock
.
{
"sinks": [
{
"config": {
"endpoint": "/run/containerd/runsc/falco.sock",
"retries": 3
},
"ignore_setup_error": true,
"name": "remote"
}
]
}
runtime_type
to the config.toml
. The pod-init-config
is auto-generated and came from another source.Check runsc
root path exists or not. We found some .state
and .sock
files, that means there are some containers using runsc.
[root@ip-10-0-148-41 ~]# ls /run/containerd/runsc/k8s.io/
a07f4d498111ebe586c50f1c274c935111489b19715d747b38fde31e10f329f1_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.lock
a07f4d498111ebe586c50f1c274c935111489b19715d747b38fde31e10f329f1_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.state
e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.lock
e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35_sandbox:e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.state
runsc-e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35.sock
We can see 2 containers use the same image 917566871600.dkr.ecr.ap-southeast-1.amazonaws.com/lab/hello-app:hello-app-build-1623
but run on different runtimes. Take a look at the container 803c85bca4…
and a07f4d4981…
.
[root@ip-10-0-148-41 ~]# sudo ctr -n k8s.io container ls
CONTAINER IMAGE RUNTIME
00c695a4210d8bf87791b4346e956e59ab33a6724711797033494c449adb8b49 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon/aws-network-policy-agent:v1.1.6-eksbuild.1 io.containerd.runc.v2
071a446aada7a7799ff8ea90c85fc2e8690239130cdc9c9d227fb365b3e3a99d docker.io/falcosecurity/falcosidekick:2.31.1 io.containerd.runc.v2
2c8a2152df26778b8dc6f7fc3f6d5720ccc3fd58e40f55b596e61fe3961783cf 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
32116969400baa71075b12ad6e89c0166d269f6457d84f65f570a39c71e5c3e2 docker.io/falcosecurity/falco:0.40.0-debian io.containerd.runc.v2
445e8305cd56cc8e64a399cbeeefb3f8135c947a4ad21315ea8626e30dba8287 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/metrics-server:v0.7.2-eksbuild.2 io.containerd.runc.v2
450df4f61bfff5a499f279d6075968321f6527cb8c6f4aff29e6f42896b38d0b 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/aws-guardduty-agent:v1.9.0 io.containerd.runc.v2
4d05d1828f0aefc57e1f0796d5748e129c5515464bbeefce08c3f6e11ca53ad1 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
4e4e83c9729d16826d66625d8cc11a8f60757b5d6e0bffb34ff9202f93aebab4 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
504e87f64df0b3501929d94cb98871fbb1891a8f7182e77b4cf5e55e14555809 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon-k8s-cni-init:v1.19.2-eksbuild.1 io.containerd.runc.v2
56fe9b55c88654cbf51d4d046190d3a82cece66579b5250e51608e7fe383b3e4 docker.io/falcosecurity/falcoctl:0.11.0 io.containerd.runc.v2
5786a49427b1e934d684e3ba7d1c3c5bd2c60bd90a827665a5f5f87c5ba56833 docker.io/falcosecurity/falcoctl:0.11.0 io.containerd.runc.v2
618cabdd25dbcd835fa1866fa03c4c3472257075d259556062452f33faf66ad0 docker.io/falcosecurity/falco-driver-loader:0.40.0 io.containerd.runc.v2
78c58ce2bf4961f73ad7590098d4bb69d12701a7406a1d4e7c02c52c00c0583f docker.io/falcosecurity/falcosidekick:2.31.1 io.containerd.runc.v2
803c85bca405bd9ccad7343b295e84b5a331358a722e69bea61bce308e2025d7 917566871600.dkr.ecr.ap-southeast-1.amazonaws.com/lab/hello-app:hello-app-build-1623 io.containerd.runc.v2
80473a9eeeeba90e51e1191c93244a8a20852e18e984f0ab308eac6e267d82ab 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/eks-node-monitoring-agent:v1.2.0-eksbuild.1 io.containerd.runc.v2
82d26f7053e0046b20f98ba241f6fdfdfd85a176e4f601449a1e9997a303f27a docker.io/falcosecurity/falcoctl:0.11.0 io.containerd.runc.v2
8db04f6715c0ab70f33a11d3cfda33f06ca75a28fe8c078d6ba3134b2cfc4df6 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/kube-proxy:v1.32.0-minimal-eksbuild.2 io.containerd.runc.v2
9e74b7c3072373e5028cf4bca8849b1fc0b88757bcd97f2804136a4321864aac 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
a07f4d498111ebe586c50f1c274c935111489b19715d747b38fde31e10f329f1 917566871600.dkr.ecr.ap-southeast-1.amazonaws.com/lab/hello-app:hello-app-build-1623 io.containerd.runsc.v1
a5a00c183537b3330798353975b400704d3fffd2dc321cb6fd211247391a37f2 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon-k8s-cni:v1.19.2-eksbuild.1 io.containerd.runc.v2
a73e226ca0822bdd5abe33b7aeb721581958f54054ebc68d104d4c4bd13063db 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
b23193b6d3c322d0fd24342e9b9ffe8928c51d16ba82342d4251dd175d5b4468 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/eks-pod-identity-agent:v0.1.21 io.containerd.runc.v2
b530e0f18f48322545f591b801d638ea1a6d818441d61bccd72cca62e01d7d46 docker.io/falcosecurity/falcoctl:0.11.0 io.containerd.runc.v2
c489b92bf8a51a74b2fb3122bcd6b109b958d61952a49c304cf68e298cee62ac 602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/eks/eks-pod-identity-agent:v0.1.21 io.containerd.runc.v2
c59cea7fd8637c428683b298fd88a33ed3be533f1e24c74866f23d9c60864a54 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
d0fcda46418f3f23043735b321833976d9855eda9d71515e8905c8cdd3ecc4c0 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
d1a3839757a918db69f536fe9131feac860584854c22e6d62024fe46985eaefd 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
d5d9bc952f53f95c468d3c5b608ecc05917581000585719aac98978a3b206e66 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
dd2b7d322686a8e8c67b23a47fb3f18db703d69268fc4d9c46eba58b9f389259 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
dd54d707ccb39a0d76eee6cb5802767c292428850bd7f51cf3259503107a303a 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runc.v2
e68a0fe1f036f06a9a53611cc40070c5cbf146fcb44ed8f889976a113db9aa35 602401143452.dkr.ecr.us-west-2.amazonaws.com/eks/pause:3.10 io.containerd.runsc.v1
ef10441d987d81d66ad597c5cb93ee91530a63a7a4ac109ec266518044cb9f66 docker.io/falcosecurity/falco:0.40.0-debian io.containerd.runc.v2
f0f8235215e6aba2adb84490fb3bbc4dcd22953a2dba35380fc5d17e20290f60 docker.io/falcosecurity/falco:0.40.0-debian io.containerd.runc.v2
io.containerd.runc.v2
. It is expected behavior.We ensure that now gVisor runtime is correctly configured and used.
Come back to the CloudShell to check the Falco-gVisor pod
~ $ kubectl logs -n falco-gvisor falco-gvisor-j66z4
Defaulted container "falco" out of: falco, falcoctl-artifact-follow, falco-gvisor-init (init), falcoctl-artifact-install (init)
Mon Apr 14 08:34:35 2025: Falco version: 0.40.0 (x86_64)
Mon Apr 14 08:34:35 2025: CLI args: /usr/bin/falco -pk
Mon Apr 14 08:34:35 2025: Falco initialized with configuration files:
Mon Apr 14 08:34:35 2025: /etc/falco/falco.yaml | schema validation: ok
Mon Apr 14 08:34:35 2025: [libs]: Cannot read host init process proc root: 13
Mon Apr 14 08:34:35 2025: [libs]: Cannot read host init process proc root: 13
Mon Apr 14 08:34:35 2025: Enabled container engine 'docker'
Mon Apr 14 08:34:35 2025: Enabled container engine 'CRI'
Mon Apr 14 08:34:35 2025: Enabled container runtime socket at '/run/containerd/containerd.sock' via config file
Mon Apr 14 08:34:35 2025: Enabled container runtime socket at '/run/crio/crio.sock' via config file
Mon Apr 14 08:34:35 2025: Configured rules filenames:
Mon Apr 14 08:34:35 2025: /etc/falco/falco_rules.yaml
Mon Apr 14 08:34:35 2025: Loading rules from:
Mon Apr 14 08:34:35 2025: /etc/falco/falco_rules.yaml | schema validation: ok
Mon Apr 14 08:34:35 2025: Hostname value has been overridden via environment variable to: ip-10-0-135-59.ap-southeast-1.compute.internal
Mon Apr 14 08:34:35 2025: Watching file '/etc/falco/falco.yaml'
Mon Apr 14 08:34:35 2025: Watching file '/etc/falco/falco_rules.yaml'
Mon Apr 14 08:34:35 2025: (19) syscalls in rules: connect, dup, dup2, dup3, execve, execveat, finit_module, init_module, link, linkat, open, openat, openat2, ptrace, sendmsg, sendto, socket, symlink, symlinkat
Mon Apr 14 08:34:35 2025: +(53) syscalls (Falco's state engine set of syscalls): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, creat, epoll_create, epoll_create1, eventfd, eventfd2, fchdir, fcntl, fork, getsockopt, inotify_init, inotify_init1, io_uring_setup, memfd_create, mount, open_by_handle_at, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, recvfrom, recvmmsg, recvmsg, sendmmsg, setgid, setpgid, setregid, setresgid, setresuid, setreuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socketpair, timerfd_create, umount, umount2, userfaultfd, vfork
Mon Apr 14 08:34:35 2025: (72) syscalls selected in total (final set): accept, accept4, bind, capset, chdir, chroot, clone, clone3, close, connect, creat, dup, dup2, dup3, epoll_create, epoll_create1, eventfd, eventfd2, execve, execveat, fchdir, fcntl, finit_module, fork, getsockopt, init_module, inotify_init, inotify_init1, io_uring_setup, link, linkat, memfd_create, mount, open, open_by_handle_at, openat, openat2, pidfd_getfd, pidfd_open, pipe, pipe2, prctl, prlimit, procexit, ptrace, recvfrom, recvmmsg, recvmsg, sendmmsg, sendmsg, sendto, setgid, setpgid, setregid, setresgid, setresuid, setreuid, setrlimit, setsid, setuid, shutdown, signalfd, signalfd4, socket, socketpair, symlink, symlinkat, timerfd_create, umount, umount2, userfaultfd, vfork
Mon Apr 14 08:34:35 2025: Starting health webserver with threadiness 2, listening on 0.0.0.0:8765
Mon Apr 14 08:34:35 2025: Enabled rules:
Mon Apr 14 08:34:35 2025: Directory traversal monitored file read
Mon Apr 14 08:34:35 2025: Read sensitive file trusted after startup
Mon Apr 14 08:34:35 2025: Read sensitive file untrusted
Mon Apr 14 08:34:35 2025: Run shell untrusted
Mon Apr 14 08:34:35 2025: System user interactive
Mon Apr 14 08:34:35 2025: Terminal shell in container
Mon Apr 14 08:34:35 2025: Contact K8S API Server From Container
Mon Apr 14 08:34:35 2025: Netcat Remote Code Execution in Container
Mon Apr 14 08:34:35 2025: Search Private Keys or Passwords
Mon Apr 14 08:34:35 2025: Clear Log Activities
Mon Apr 14 08:34:35 2025: Remove Bulk Data from Disk
Mon Apr 14 08:34:35 2025: Create Symlink Over Sensitive Files
Mon Apr 14 08:34:35 2025: Create Hardlink Over Sensitive Files
Mon Apr 14 08:34:35 2025: Packet socket created in container
Mon Apr 14 08:34:35 2025: Redirect STDOUT/STDIN to Network Connection in Container
Mon Apr 14 08:34:35 2025: Linux Kernel Module Injection Detected
Mon Apr 14 08:34:35 2025: Debugfs Launched in Privileged Container
Mon Apr 14 08:34:35 2025: Detect release_agent File Container Escapes
Mon Apr 14 08:34:35 2025: PTRACE attached to process
Mon Apr 14 08:34:35 2025: PTRACE anti-debug attempt
Mon Apr 14 08:34:35 2025: Find AWS Credentials
Mon Apr 14 08:34:35 2025: Execution from /dev/shm
Mon Apr 14 08:34:35 2025: Drop and execute new binary in container
Mon Apr 14 08:34:35 2025: Disallowed SSH Connection Non Standard Port
Mon Apr 14 08:34:35 2025: Fileless execution via memfd_create
Mon Apr 14 08:34:35 2025: (25) enabled rules in total
Mon Apr 14 08:34:35 2025: Loaded event sources: syscall
Mon Apr 14 08:34:35 2025: Enabled event sources: syscall
Mon Apr 14 08:34:35 2025: Opening event source 'syscall'
Mon Apr 14 08:34:35 2025: Opening 'syscall' source with gVisor. Configuration path: /gvisor-config/pod-init.json
Mon Apr 14 08:34:35 2025: [libs]: Trying to open the right engine!
The logs look verbose because I enabled debug mode for Falco’s logging. We have some useful information:
Falco is configured and loaded with rules successfully.
In the enabled rule list, we have
Terminal shell in container
rule.Falco is using gVisor (Sentry) as a kernel driver instead of eBPF.
So Falco seems ready. Ensure FalcoSidekick is ready too
~ $ kubectl logs -n falco-gvisor falco-gvisor-falcosidekick-66d74d76dd-pthdk
2025/04/14 08:34:54 [INFO] : Falcosidekick version: 2.31.1
2025/04/14 08:34:54 [INFO] : Enabled Outputs: [AWSLambda]
2025/04/14 08:34:54 [INFO] : Falcosidekick is up and listening on :2801
One thing we can notice: (*) the configuration path value in the logs (Configuration path: /gvisor-config/pod-init.json
) is different from the value in the config.toml
(pod-init-config = "/run/containerd/runsc/pod-init.json"
). To understand why we need to check falco-gvisor-init
container logs
~ $ kubectl logs -n falco-gvisor falco-gvisor-j66z4 -c falco-gvisor-init
* Configuring Falco+gVisor integration....
* Checking for /host/etc/containerd/config.toml file...
* Generating the Falco configuration...
2025-04-14T08:34:27+0000: Falco version: 0.40.0 (x86_64)
2025-04-14T08:34:27+0000: Falco initialized with configuration files:
2025-04-14T08:34:27+0000: /etc/falco/falco.yaml | schema validation: ok
2025-04-14T08:34:27+0000: System info: Linux version 6.1.131-143.221.amzn2023.x86_64 (mockbuild@ip-10-0-47-218) (gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), GNU ld version 2.41-50.amzn2023.0.2) #1 SMP PREEMPT_DYNAMIC Mon Mar 24 15:35:21 UTC 2025
* Setting the updated Falco configuration to /gvisor-config/pod-init.json...
* Falco+gVisor correctly configured.
We found useful information: Falco+gVisor is correctly configured with the config path /gvisor-config/pod-init.json
. Falco confirmed that the Falco-gVisor integration is perfect. But where do the logs come from? When digging into the Falco source code on GitHub, I found an interesting file helpers.tpl. This file is used to bootstrap Falco-gVisor integration when deploying Falco. The file’s content is very verbose, we just need to focus on several lines:
/usr/bin/falco --gvisor-generate-config=${root}/falco.sock > /host${root}/pod-init.json
sed 's/"endpoint" : "\/run/"endpoint" : "\/host\/run/' /host${root}/pod-init.json > /gvisor-config/pod-init.json
That explains the (*). The integration between Falco and gVisor is perfectly configured. Let’s spawn a shell inside the isolated container.
~ $ kubectl exec -it hello-app-sandboxed-5d5fd9f9df-rwtg9 -- /bin/sh
~ # whoami
root
But NO alert fired.
Currently, the Falco-gVisor integration is working on
Container engine only: Docker or Containerd.
Google Kubernetes Engine (supports out of the box).
Other Kubernetes environments, like MiniKube.
The Falco-gVisor integration does not work on the AWS EKS.
Root cause: gVisor Sentry can not send data to Falco via falco.sock
.
Every day at noon UTC +0 (7 PM in Vietnam), we receive a security daily report in our mailbox
4. Limitations and improvements needed
The AWS Config rules in the demo source code are AWS-managed rules only. We need to write more custom rules for the EKS cluster to improve the security.
In the AWS Config rules, we have one
resource "aws_config_config_rule" "eks_endpoint_no_public_access" { name = "eks-endpoint-no-public-access" description = "Ensures EKS cluster endpoints are not publicly accessible to minimize unauthorized access risks." source { owner = "AWS" source_identifier = "EKS_ENDPOINT_NO_PUBLIC_ACCESS" } scope { compliance_resource_types = ["AWS::EKS::Cluster"] } }
But for now, the Kubernetes API server endpoint is publicly accessible. We need to set up a VPN solution to resolve it.
The Falco-gVisor integration is not working as of now. I’m preparing to open an issue in the Falco GitHub repository.
Subscribe to my newsletter
Read articles from Tùng Nguyễn Thanh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
