Navigating NACLs and Security Groups for Session Manager Connectivity in AWS

I frequently create and manage EC2 instances in AWS. While there are multiple ways to access these instances, connecting via AWS Systems Manager (SSM) Session Manager is my preferred method for its secure seamlessness. However, setting up the session manager connection properly can be tricky, and even with the correct network configuration, the brief connection delay sometimes can lead to uncertainty. This guide aims to demystify the required network configurations and provide essential information for enabling secure connections in a VPC.

💡
While this guide focuses on Session Manager connectivity, the principles and configurations discussed here apply broadly to securing any inbound or outbound connections to your VPC-bound workloads. Understanding these concepts will enhance your overall AWS network security practices.

The Challenge

Setting up secure access control for the traffic within your VPC and to your instances can potentially be a headache. The AWS documentation doesn't cover the solution provided here and I've seen that the Stackoverflow answers here did not do justice. This isn't an issue in a less secure or the default VPC because all traffic is enabled on the firewalls by default, but it is a good practice to apply security best practices to all your work.

The Solution

Assuming your VPC, subnet, and instance have outbound internet access, and your instance meets all the Session Manager prerequisites. If you're aiming to implement the principle of least privilege in your network configurations, as I do, you can apply the following settings for Session Manager network configurations:

Create customAllow inbound portAllow outbound port
Security Group-443
NACL1024 - 65535443
  • Security Group configurations

Security Group Inboud Rule

Security Group Outbound Rule

  • Network ACL configurations

  • Session Manager Enabled

This configuration can be validated with this CloudFormation template.

The Key Insight: Ephemeral Ports

The trick is enabling traffic for the ephemeral ports in your NACLs. But why are ephemeral ports needed?

Ephemeral ports are temporary ports assigned by the operating system for client-side communication. When a client initiates a connection to a server, it uses a well-known port (like 443 for HTTPS), but the client's source port is randomly chosen from the ephemeral port range (typically 1024-65535) for the return traffic from the internet.

In the context of Session Manager:

  1. Your instance initiates an outbound connection to the AWS Systems Manager service on port 443.

  2. The response from the service comes back on an ephemeral port.

  3. If your NACLs don't allow inbound traffic on these ephemeral ports, the response can't reach your instance, breaking the connection.

This is why we need to allow inbound traffic on ports 1024-65535 in the NACL.

💡
Similarly, if you configure an inbound rule that allows traffic for a specific port, you'd also need a corresponding outbound rule with the required ephemeral ports that enable responses to that inbound traffic.

Understanding Stateful vs. Stateless

You might wonder why you don't need to enable the same ephemeral port range in Security Groups (SGs). The answer lies in the fundamental difference between these two security layers:

  • Security Groups are stateful

  • NACLs are stateless

But what exactly do "stateful" and "stateless" mean in this context?

Stateful - Security Groups

Security Groups (SG) keep track of the state of traffic. When an outbound request is allowed, the SG automatically allows the corresponding inbound response, regardless of the inbound rules. This is why you don't need to explicitly allow the ephemeral port range in SGs.

Stateless - NACLs

NACLs do not keep track of traffic state. Each packet is evaluated against the NACL rules independently, regardless of any previous packets. This means you need to explicitly allow both the outbound request and the inbound response in your NACL rules.

As AWS documentation states:

"NACLs are stateless, which means that information about previously sent or received traffic is not saved. If, for example, you create a NACL rule to allow specific inbound traffic to a subnet, responses to that traffic are not automatically allowed. This is in contrast to how security groups work. Security groups are stateful, which means that information about previously sent or received traffic is saved. If, for example, a security group allows inbound traffic to an EC2 instance, responses are automatically allowed regardless of outbound security group rules."

Improvements

The network connection doesn't have to go out via the internet especially when the traffic is between AWS services. In this case, a VPC endpoint can be created with AWS PrivateLink for the Session Manager connection. This way the traffic between SSM and EC2 is private and restricted to the Amazon network. See the setup information here from AWS re:post.

As a DevOps practitioner, I have created here, a useful terraform module that can help to dynamically set up secure firewalls - SG and NACL configurations for SSM. I also plan to add the feature for VPC endpoint connection in the future.

Conclusion

Understanding the role of ephemeral ports and the differences between stateful and stateless firewalls is crucial when configuring secure AWS environments. While implementing these custom configurations may seem like extra work, the enhanced security and precise control over your network traffic are well worth the effort. By mastering these concepts, you'll be better equipped to design and maintain secure, efficient AWS infrastructures.

32
Subscribe to my newsletter

Read articles from Foluso Ogunsakin directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Foluso Ogunsakin
Foluso Ogunsakin

Servant for IT Infrastructure and Cloud Computing.