AI Engineering: Serverless FastAPI deployment with AWS ECR & AWS Lambda


Hello Techies👋! I’m Samiksha, Hope you all are doing amazing stuff. I’m back with another blog about Best MLOps practices for AI Engineering, Today I’m going to cover the Deployment Stratergy for your AI Application i.e. This Article covers Step by step process on how to deploy your FastAPI application as a Container using AWS ECR as container storage and AWS Lambda service.
Please NOTE: This article is for intermediate AI professionals building AI products, Pre-requisites are Docker, AWS services, Containerization basic knowledge.
If you are beginner and wanted to learn the Docker & Containerization - checkout this blog which covers details on building the Dockerfile for your application: https://teckbakers.hashnode.dev/ai-engineering-best-practices-to-create-a-dockerfile-you-ever-know && for detailed Devops practices to know about AWS and Containerization- checkout teckbakers Devops&Cloud Series here: https://teckbakers.hashnode.dev/series/devops.
Let’s deploy your RestAPI Application using AWS ECR and AWS Lambda, After following this article, anyone can deploy their RestAPI over AWS.
Let’s gooo…. As this is the step by step practical deployment article, In have referenced a Production ready full-stack RAG application, checkout the Code here:https://github.com/kolhesamiksha/Hybrid-Search-RAG & to know more about Best Practices to built production ready RAG checkout this article: https://teckbakers.hashnode.dev/ai-consultant-hybrid-rag-chatbot.
Yep, Before starting let’s understand the basic terminologies first.
What is RestAPI?
A REST API is an API (application programming interface) that follows REST design principles. REST stands for representational state transfer. Let's break this down:
Representational: Any content we can name is a resource. For example,
https://sampleapi.com/customers
is a resource. A resource, as defined with a certain format (such as XML or JSON) is a resource representation. REST provides the representation of the resource to the client.State Transfer: Each request from the client to the server must contain all the information needed to parse and complete that request. This information includes such things as the API key, user agent, and HTTP version. This information is referred to as the session state, and can vary by client and request. For REST, the state must be transferred with every request and cannot be stored on the server. We might think of it as the server meeting the client for the first time every time.
Why SaaS teams and products use REST APIs?
As REST APIs uses the HTTP protocol for data transfer. the same protocol that drives access to the billions of web pages people view daily. As a result, many developers find REST APIs attractive because they are already familiar with HTTP.
And it's easy to work with a REST API. Devs can use curl, Postman, or any number of HTTP clients to set up requests with a REST API in a few seconds.
Adding to the overall ease of use, many REST APIs follow OpenAPI standards – making it straightforward for developers to work with an API without first studying extensive technical documentation.
Why Containerization is needed for your AI application?
Containerization is needed for AI applications due to portability across different environments, scalability for fluctuating workloads, reproducibility for consistent development and deployment, and isolation for efficient resource use and stable operation. AI models require specific dependencies, which containers package to ensure they run reliably and efficiently anywhere—from a developer's laptop to cloud servers—enabling faster deployment, easier collaboration, and better management of complex AI workflows.
Here's a breakdown of the key reasons:
Portability:
AI applications often have complex dependencies and require specific environments and libraries. Containerization bundles the AI model, its code, and all its dependencies into a single, portable unit that can run on any machine or cloud without modification.
Scalability:
AI workloads can be highly variable. Containers allow for rapid provisioning of new instances to handle increased demand and easy scaling down when demand decreases, ensuring the application remains responsive.
Reproducibility & Consistency:
Containers create a consistent and isolated environment for the AI application. This ensures that the application behaves the same way during development, testing, and production, reducing "it works on my machine" problems and making the entire AI lifecycle more predictable and reliable.
Resource Efficiency:
Containers are lightweight compared to virtual machines because they share the host machine's operating system kernel. This efficiency allows more containers to run on the same hardware, optimizing resource utilization for resource-intensive AI workloads.
Isolation:
Each containerized application runs in its own isolated environment. This prevents conflicts between different AI models or their dependencies and limits the impact of bugs within one container, ensuring the stability of the entire system.
Facilitates MLOps:
Containerization is a foundational element of MLOps (Machine Learning Operations). It helps bridge the gap between research and production by enabling efficient deployment, automation, and continuous integration/continuous deployment (CI/CD) for AI models.
Support for Specialized Hardware:
Tools like the NVIDIA Container Toolkit allow containers to efficiently utilize specialized hardware like GPUs, which are essential for training and deploying many AI models, making containerization a standard for GPU-accelerated AI applications.
Why AWS Lambda to deploy the RestAPI?
AWS Lambda is a serverless, event-driven compute service provided by Amazon Web Services (AWS). It allows users to run code without provisioning or managing servers. Instead, Lambda automatically manages the underlying compute resources, including server and operating system maintenance, capacity provisioning, automatic scaling, code and security patch deployment, and monitoring and logging.
Key features of AWS Lambda include:
Serverless:
Users do not need to manage any servers or infrastructure. Lambda handles all the underlying compute resources.
Event-driven:
Lambda functions are triggered by events, such as changes in data in an Amazon S3 bucket, updates to an Amazon DynamoDB table, or HTTP requests via Amazon API Gateway.
Automatic scaling:
Lambda automatically scales the resources allocated to a function based on the incoming request or event traffic.
Pay-per-use pricing:
Users are charged only for the compute time consumed when their code is running, with no charge when the code is inactive.
Support for multiple languages:
Lambda supports various programming languages, including Node.js, Python, Java, Go, C#, and Ruby.
Integration with other AWS services:
Lambda can be easily integrated with a wide range of other AWS services to build complex, scalable applications.
Now, Let’s built a Lambda function for your FastAPI application step by step:
read this blog from AWS to get more deeper knowledge about security or permissions for your solution:
https://docs.aws.amazon.com/lambda/latest/dg/images-create.html
There are two ways by which we can deploy our Application over AWS lambda 1. through .zip file of your application with a custom handler in the code & through a containerization approach through AWS ECR.
Pre-requisites to run Container Image on AWS Lambda -
To create a Lambda function from a container image, build your image locally and upload it to an Amazon Elastic Container Registry (Amazon ECR) repository. you need to clone the image to your private Amazon ECR repository first. Then, specify the repository URI when you create the function. The Amazon ECR repository must be in the same AWS Region as the Lambda function. You can create a function using an image in a different AWS account, as long as the image is in the same Region as the Lambda function.
Create an AWS Lambda Function:
Access the Lambda Console: Sign in to the AWS Management Console and navigate to the Lambda service.
Initiate Function Creation: On the Lambda console dashboard, choose "Create function."
Select Creation Method: Choose "Author from scratch" to define your function from the ground up. Other options like blueprints or container images are available for specific use cases.
Configure Basic Settings:
Function name: Provide a unique and descriptive name for your Lambda function.
Runtime: Select the programming language and runtime version for your function (e.g., Python 3.9, Node.js 18.x).
Architecture: Choose the desired processor architecture (e.g., x86\_64 or arm64).
Define Permissions:
Execution role: Determine the permissions your Lambda function will have to interact with other AWS services. You can choose to:
Create a new role with basic Lambda permissions.
Create a new role from AWS policy templates for common use cases (e.g., S3 read permissions).
Use an existing IAM role.
Create the Function: After configuring the settings and permissions, choose "Create function."
Using .zip file follow this if you wanted to built AWS lambda function through uploading zip file to AWS S3 and then referencing that to Lambda function: https://docs.aws.amazon.com/lambda/latest/dg/python-package.html.
- create a Lambda function handler in Python. below is the detailed guide by AWS on best practices to built a function handler to trigger your FastAPI event.
https://docs.aws.amazon.com/lambda/latest/dg/python-handler.html
But Our Main Focus is to Make our Application FastAPI is Dockerise and then push to the Lambda function:
Using Docker - AWS ECR:
Prerequisites
AWS Credentials with permissions to interact with ECR.
Docker installed for your operating system.
The AWS CLI installed locally.
Create an AWS ECR registry -
Follow the below Article to create a container registry to store the versions of your Images on AWS: https://aws.plainenglish.io/how-to-create-elastic-container-repository-ecr-in-aws-8b04fa98f130
After Creating the container registry, follow below steps one by one:
- Create a Dockerfile for your application which should expose the FastAPI. then download the Docker in your local system first. then Follow the below commands one by one.
NOTE: to know how to create a Dockerfile for your application, I have published an article before to detailly explain how to create a Dockerfile: https://teckbakers.hashnode.dev/ai-engineering-best-practices-to-create-a-dockerfile-you-ever-know.
- Build the docker image first: which download all the necessary dependencies, metadata, system requirements in to the Container application.
Now through AWS CLI login to the ecr repository we created above with necessary login and passwords.
make sure your tag name of the docker image should be equal to ECR repository name, It’s mandatory to push the image to ECR registry.
Push the image now to ECR.
You can see the image with it’s tag as a version: latest pushed to the registry we created on AWS ECR.
Uptil now we created two things, AWS Lambda function and Our Dockerise Code Image pushed to AWS ECR registry. Now to trigger the Lambda Function let’s built an HTTP Gateway which manages the API request to the lambda function, hence it becomes an event-driven architecture to Run the FastAPI.
Create an API Gateway
Created an API Gateway as a traffic router for your API requests which will trigger the AWS lambda function to hit the FastAPI and function to provide us the response.
Provide the API’s you wanted to expose, here to my endpoint i have added /health to check the API status, /chatbot as an endpoint which internally hits the /predict endpoint of the FastAPI application. Write all your endpoints you wanted to trigger as a list here..
Yeahhh🙌🏻, You have deployed your first Serverless Service using AWS lambda and ECR..
Please feel free to contribute to this article in comments, share your insights and experience in optimizing docker images for your Large scale application. This will help everyone to learn from each others experience!!.
till then, Stay tuned and follow our newsletter to get daily updates & Built Project End to end!! Connect with me on linkedin, github, kaggle.
Let's Learn and grow together:) Stay Healthy stay Happy✨. Happy Learning!!
Subscribe to my newsletter
Read articles from Samiksha Kolhe directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Samiksha Kolhe
Samiksha Kolhe
Hey I'm Samiksha Kolhe. a Data Enthusiast and aspiring Data Scientist. One day Fascinated by a fact that "We can built Time machines and predict future using AI". That hit my dream to explore the Vector space and find out what the dark matter is about. World and Technology every day brings new challenges, and new learnings. Technology fascinated me, I'm constantly seeking out new challenges and opportunities to learn and grow. A born-ready girl with deep expertise in ML, Data Science, and Deep Learning, generative AI. Curious & Self-learner with a go-getter attitude that pushes me to build things. My passion lies in solving business problems with the help of Data. Love to solve customer-centric problems. Retail, fintech, e-commerce businesses to solve the customer problems using Data/AI. Currently learning MLops to build robust Data/ML systems for production-ready applications. exploring GenAI. As a strong collaborator and communicator, I believe in the power of teamwork and diversity of thoughts to solve a problem. I'm always willing to lend a helping hand to my colleagues and juniors. Through my Hashnode blog, I share my insights, experiences, and ideas with the world. I love to writing about latest trends in AI and help students/freshers to start in their AI journey. Outside technology I'm a spiritual & Yoga person. Help arrange Yoga and mediation campaigns, Volunteering to contribute for better society. Love Travelling, Reading and Learn from world.