Architecture of a NextJS app on AWS: an interview story

Siddhartha SSiddhartha S
8 min read

Introduction

In a recent job interview for an Architect role, I was asked to create a deployment plan for a Next.js application. While the straightforward approach of deploying the app on Vercel directly from GitHub may seem like a viable option, it may not be the best choice for enterprise-level applications. Vercel can be costly for high-traffic applications, and there is less control over security and scalability aspects. Additionally, if the application has multiple components like a database, cache, and file system, the costs can quickly escalate.

In this article, I will share the thought process I used to address the interviewer's concerns and propose a more cost-effective and scalable deployment solution using AWS services. The aim is not only to showcase the various AWS services that can be leveraged, but also to provide an example of how real-world interviews can progress and the thought process that should guide the response.

This article follows an unconventional format, experimenting with an interview-style narrative. I, as the candidate, will present my proposed solutions, and the interviewer will provide feedback and additional constraints to challenge the solution. Through this interactive format, I hope to demonstrate the thought process and adaptability required in a technical interview setting.

The target audience for this article includes DevOps professionals, full-stack developers, and AWS enthusiasts who are interested in exploring deployment strategies for Next.js applications.

Ice Breaker

Interviewer: What are your preferred technologies for full-stack application development?

Me: It depends on the application size and scope. For enterprise-level applications, I prefer C# and Angular/React. For distributed applications with smaller services, I tend to use Golang. And for small to medium-sized projects that require less compute-intensive operations, I often recommend Next.js.

Interviewer: That's interesting. Let's assume we have a medium-sized application, and since you prefer Next.js, how would you suggest deploying the application?

Me: Well, Vercel and similar cloud providers offer great integration with GitHub, providing out-of-the-box environment support and custom domain management. This can be a straightforward solution for a basic Next.js application.

Interviewer: Our application, however, makes use of a database, file system, and a cache. Do you think Vercel is the best option from a cost perspective?

Me: You're right. Considering the complexity of the application with the additional components, using Vercel may not be the most cost-effective solution. In this case, I would suggest leveraging AWS services directly.

💡
With his second question, it became clear that the interviewer was more focused on the DevOps plan than on the technology side or principles of distributed systems. I aimed to provide the simplest possible answer, starting with a brute force approach, even though I recognized it might not be the best solution.
💡
As he continued with subsequent questions, he introduced additional components to the application, making it apparent that choosing Vercel was no longer a viable option.

First Solution

Me: May I assume the database to be an RDS?

Interviewer: Be my guest!

Me: I propose the following as the initial solution:

  • An EC2 instance for hosting the Next.js application.

  • Aurora DB, SQS, and EFS (Elastic File System) as managed AWS services. Being managed services, Aurora, EFS, and SQS provide high availability.

  • Route 53 for DNS management.

  • A GitHub workflow to push the latest build to the EC2 instance.

Interviewer: This solution has quite a few problems. Firstly, while the managed services are highly available, the EC2 instance isn’t. My traffic follows a pattern where there are only a few hours in the day when the application experiences very high traffic—almost 10 times the usual volume. How should I proceed with a cost-effective solution?

💡
The interviewer clearly wanted me to provide a cloud provider solution, as he introduced conditions to advance the discussion. His initial feedback on my solution was complemented by a second, where he acknowledged the high availability of the managed services but pointed out that a single EC2 instance lacks that same availability. This made it evident that the application itself was not highly available.
💡
Additionally, he noted the traffic surge that occurs for only a few hours a day, which likely suggests he had a trading application in mind that operates solely during market hours. Simply adding more EC2 instances to improve availability isn’t a viable option; provisioning EC2 capacity to handle 10 times the usual traffic during peak hours would not be cost-effective.

Challenge 1: Scalability vis-a-vis cost

Me: Since we want to enhance the scalability of our application in an on-demand manner, I propose using an ECS Fargate cluster, as illustrated in the diagram below:

Explanation:

  • The Aurora DB, EFS, and SQS will continue to be managed AWS services, ensuring their availability.

  • ECS (Elastic Container Service) allows you to run containers on demand. During periods of high traffic, ECS services will automatically spin up tasks (containers), and as the load decreases, the extra containers will be terminated. Billing is based on usage, and the underlying hardware is fully managed by AWS, alleviating concerns about infrastructure management.

  • ECS can also be run on EC2 instances; however, the cluster requires designated EC2 instances to operate. Given the disproportionate traffic spikes, Fargate is the chosen solution for this scenario.

  • ECS can connect directly to ECR (Elastic Container Registry), where the built image can be stored and accessed through the GitHub workflow.

Interviewer: While the scalability aspect is addressed, I still don’t see how the application is highly available. Additionally, do you believe the security aspects are adequately handled?

💡
The interviewer was correct in noting that availability has not yet been addressed; the ECS cluster would still operate within the same Availability Zone (AZ), posing a risk of downtime. I overlooked this important aspect.
💡
He also hinted at the necessity of considering the security aspects of the deployed application in the public cloud.

Challenge 2: Security and Availability

Me: My apologies! I overlooked the availability aspect, and I understand that security must also be addressed. I propose the following diagram:

Explanation:

  1. A VPC belongs to a region, and we can have subnets (public/private) within a single Availability Zone (AZ). By deploying two ECS clusters in different private subnets, we can enhance the application's availability.

  2. An Application Load Balancer (ALB) is a highly available managed AWS service that can operate across multiple public subnets (and thus multiple AZs). The load balancer will distribute traffic in a round-robin manner across the two clusters.

  3. Since the ECS clusters are now in private subnets, their access to the ECR repository and managed services like Aurora DB, SQS, and EFS is restricted. Therefore, the VPC must include the following endpoints:

    1. A Gateway Endpoint for accessing the managed Aurora DB service.

    2. An Interface Endpoint for accessing the managed SQS service over a private link.

    3. An EFS mount target for connecting to the EFS.

    4. An ECR Endpoint for accessing the private ECR repository.

  4. The VPC will be connected to an Internet Gateway (IG) to allow external traffic to reach the ALB in the public subnets. As the first point of contact, the ALB will handle SSL offloading, and Route 53 will resolve to the ALB's Elastic IP address.

Interviewer: I see that most of the issues are resolved. Is there a way to fetch a report for download using our app that can only be created on our on-premises servers due to confidentiality reasons?

💡
The interviewer is introducing an additional condition related to connecting with the on-premises data center.

Challenge 3: Integration with on Prem Data center

Me: To accommodate the added condition of connectivity with the on-premises data center, we can utilize AWS Direct Connect. The following diagram elaborates on this idea:

Explanation: The VPC connects to the on-premises data center using AWS Direct Connect.

Further

The article introduces numerous AWS services and concepts; however, the architecture has been overly simplified for brevity, and many minor details have been omitted. For example:

  1. SSL on the ALB may be managed automatically by AWS Certificate Manager (ACM), which handles renewals upon expiry. Additionally, the ALB will require an Elastic IP address.

  2. Many flows have been simplified for clarity, and what is presented in the diagrams offers a high-level view of the overall architecture. For instance, the connection from GitHub to ECR would involve additional intricacies. If an Infrastructure as a Service (IaaS) tool like Terraform is required, more components and services would be necessary in the DevOps flow.

  3. Moreover, the subnets within the VPC will need their own route tables and security groups, which have been omitted in the diagrams for the sake of brevity.

  4. There is no one way of doing things and the same problem could have been solved using EKS (Elastic Kubernetes Service) however using Kubernetes for a single next JS application seems overkill at first sight. ECS fits the bill perfectly.

Conclusion

In this article, we explored how enterprise-level deployments are structured for a Next.js application. We discussed various AWS services, including ECS, services, tasks, VPC, subnets, and the different endpoints such as gateway and interface endpoints. Additionally, we examined AWS Direct Connect, which facilitates the connection between an AWS VPC and an on-premises data center.

Throughout the interview process, we realized that providing minimal input allows the interviewer to guide the discussion effectively. A skilled interviewer is typically adept at prompting the interviewee to elicit the answers they are seeking.

I hope this article has offered you valuable insights into the enterprise deployment strategies employed by large organizations. Even a small application built with Next.js requires robust infrastructure to operate in a scalable, available, and secure manner.

1
Subscribe to my newsletter

Read articles from Siddhartha S directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Siddhartha S
Siddhartha S

With over 18 years of experience in IT, I specialize in designing and building powerful, scalable solutions using a wide range of technologies like JavaScript, .NET, C#, React, Next.js, Golang, AWS, Networking, Databases, DevOps, Kubernetes, and Docker. My career has taken me through various industries, including Manufacturing and Media, but for the last 10 years, I’ve focused on delivering cutting-edge solutions in the Finance sector. As an application architect, I combine cloud expertise with a deep understanding of systems to create solutions that are not only built for today but prepared for tomorrow. My diverse technical background allows me to connect development and infrastructure seamlessly, ensuring businesses can innovate and scale effectively. I’m passionate about creating architectures that are secure, resilient, and efficient—solutions that help businesses turn ideas into reality while staying future-ready.