Fortifying our AWS Organization with temporary credentials ๐โ๏ธ
Table of contents
As organizations increasingly adopt cloud services such as Amazon Web Services (AWS) to improve their digital infrastructure, managing access permissions becomes a crucial concern. In AWS, addressing ad hoc access within an AWS Organization can be especially challenging.
Background story
In our daily, our cloud operation team operates more than a hundred accounts within our AWS Organization. The process of granting user access to an account might appear straightforward with just a few accounts, but as we continue to grow, managing a vast array of permissions presents its own distinct challenges.
For example, an operator may need to request temporary access to a project team's AWS account to perform maintenance work. With hundreds of requests like this daily, our team burnt out. We realized that action needed to be taken.
In this blog post, we will explore JIT, our internal access manager that helps make cloud access more secure.
Solution
Why not use an existing solution, you may ask. Although there are some solutions out there from CyberArk and even AWS itself, we have a few constraints that are convincing enough to create our own masterpiece:
We need a more customizable approval step. An approval step might come from different people, with the ability for an approver to delegate their job to others while they're busy, for example, on vacation.
We need to have a mechanism for grouping permission sets. In AWS, permission sets are created and ready to use for any account in its AWS organization.
We want to allow ticket requesters to extend their ticket time frame.
Our requester might be a team manager who wants to request tickets for other users.
With that being said, let's make a quick rusty product requirement:
As a requester, I want to be able to:
Create a ticket for myself, another person(s), or both. The ticket should include: start time, end time, the desired permission set, the account to be accessed, and a brief description explaining the rationale for the ticket.
If my ticket is rejected, I want to edit and resend it to the approver(s).
If I need more time to complete my task, I can request that my approvers extend the end time of my ticket.
As a system administrator, I want to be able to:
Register and unregister accounts in my AWS organization to this system (JIT).
Register and unregister permission sets along with their access scopes to this system (JIT).
Configure approval rules for registered accounts: the number of approval stages, users in each stage, and the number of approvals required in each stage for it to be considered approved.
As an approver, I want to be able to:
Approve or reject tickets related only to the account(s) I am responsible for.
Have full control over the ticket I have approved: terminate them any time and audit user activities within the requested time frame for the targeted account.
Let's dive deeper.
We will split JIT into smaller services:
Ticket service
Approval service
Approval rule service
Permission set manager service
Account manager service
Notification service
Audit service
Requester flow
Approver flow
Administrator flow
Demo
The requester requests a ticket with the necessary informations.
Approver will receive notification via email, they can log in to JIT portal via an Identity Provider that supports OpenID Connect. After the ticket is approved, user will be granted access.
Approver can track user activity in this time frame.
Discussions
At the heart of JIT is Temporal, which we use to orchestrate background processes handling notifications, approval verification logic, and resource creation logic. Temporal greatly assists us in building durable workflows with multiple parallel logic branches. Additionally, Temporal's workflow status query and workflow signal help simplify a significant amount of implementation code. We can see what's happening in the workflow of each ticket and send signals to it.
Since we are only creating account assignments behind the scenes, we've decided that Terraform (a popular IaC tool written in Go) is overkill. Instead, we create resources directly with the AWS Go SDK v2 and utilize Temporal workflows with a simple stack to record which resources were created, allowing us to clean them up afterward.
Authorization in JIT is implemented with Casbin, a popular authorization library also written in Go. Each approval rule/permission set access scope is converted to a Casbin policy. With a delicate use of Casbin, we can enforce our own customizable logic into AWS permission set ๐
Building the UI was also a challenge for us, as the UI needed to be as familiar as possible to end users. Since we work in an enterprise that just adopted AWS recently, we want users to feel at home whether they are using the actual AWS console or the JIT console. Fortunately, AWS has its design system open-sourced under the name Cloudscape. Users will have an AWS-like experience when using JIT.
Conclusion
Managing access permissions in an AWS Organization can be a daunting task, especially with a growing number of accounts. Creating a custom solution like JIT, that is tailored to the specific needs of your organization, can significantly streamline the process. By breaking down the system into smaller services and leveraging existing tools and libraries like Temporal, Terraform, AWS Go SDK, and Casbin, we can build a robust, efficient, and secure access management system. Furthermore, using an AWS-like UI design system such as Cloudscape can provide a familiar user experience. We look forward to sharing JIT with the wider community soon.
Subscribe to my newsletter
Read articles from Phuong Duy Nguyen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by