Deploying a Machine Learning Model on AWS with SageMaker: A DevOps Guide

Machine learning (ML) is reshaping industries, and organizations are racing to integrate ML capabilities into their products. However, transitioning from a trained model in a Jupyter notebook to a scalable, production-ready deployment is a significant challenge. This is where AWS SageMaker comes into play. In this article, we will walk through the process of deploying a machine learning model on AWS SageMaker and serving it via a REST API using Amazon API Gateway and a Docker container. This guide is especially tailored for DevOps professionals looking to streamline ML model deployment in a production environment.
Introduction to AWS SageMaker
AWS SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. It supports a wide range of ML frameworks and comes with built-in algorithms, Jupyter notebooks, and integration capabilities with other AWS services.
SageMaker provides several deployment options:
Hosted endpoints for real-time predictions.
Batch Transform for offline predictions.
Multi-Model Endpoints to deploy multiple models using the same infrastructure.
In this guide, we focus on deploying a model to a real-time hosted endpoint.
Overview of the Deployment Workflow
Here is a high-level overview of what we will accomplish:
Train and export your ML model (e.g., scikit-learn, TensorFlow, PyTorch).
Build a Docker container to serve the model using a
predictor.py
and aDockerfile
.Push the Docker container to Amazon ECR.
Deploy the container as a SageMaker endpoint.
Create a RESTful API using Amazon API Gateway.
Secure and monitor the deployment.
Preparing the Model
Before we deploy, ensure you have a trained model serialized to a file format your inference code can read (e.g., .pkl
, .joblib
, .h5
). Here's a simple example using scikit-learn:
# train_model.py
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
model = RandomForestClassifier()
model.fit(X, y)
joblib.dump(model, 'model.joblib')
Upload the model.joblib
file to an S3 bucket:
aws s3 cp model.joblib s3://your-bucket-name/model/model.joblib
Building a Docker Container for SageMaker
Create a project structure like this:
ml-docker/
├── Dockerfile
├── predictor.py
├── serve
├── model.joblib (optional, can be loaded from S3)
└── requirements.txt
predictor.py
This script loads the model and handles prediction logic.
# predictor.py
import os
import joblib
import json
from sklearn.ensemble import RandomForestClassifier
MODEL_PATH = "/opt/ml/model/model.joblib"
model = joblib.load(MODEL_PATH)
def predict(input_data):
input_json = json.loads(input_data)
prediction = model.predict([input_json["features"]])
return {"prediction": prediction.tolist()}
serve
script
This script connects with SageMaker runtime and serves HTTP requests.
# serve
#!/usr/bin/env python3
from flask import Flask, request, jsonify
from predictor import predict
app = Flask(__name__)
@app.route("/ping", methods=["GET"])
def ping():
return "pong", 200
@app.route("/invocations", methods=["POST"])
def invocations():
return jsonify(predict(request.data))
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)
requirements.txt
flask
scikit-learn
joblib
Dockerfile
FROM python:3.9
WORKDIR /app
COPY . /app
RUN pip install -r requirements.txt
ENV MODEL_PATH="/opt/ml/model/model.joblib"
EXPOSE 8080
CMD ["python", "serve"]
Pushing to Amazon ECR
- Authenticate Docker with ECR:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com
- Create a new ECR repository:
aws ecr create-repository --repository-name ml-deploy-demo
- Build and push the Docker image:
docker build -t ml-deploy-demo .
docker tag ml-deploy-demo:latest <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/ml-deploy-demo:latest
docker push <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/ml-deploy-demo:latest
Creating a SageMaker Model Endpoint
- Create a SageMaker Model
import boto3
sagemaker = boto3.client('sagemaker')
model_name = 'ml-deploy-model'
role = 'arn:aws:iam::<aws_account_id>:role/SageMakerExecutionRole'
image_uri = '<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/ml-deploy-demo:latest'
sagemaker.create_model(
ModelName=model_name,
PrimaryContainer={
'Image': image_uri,
},
ExecutionRoleArn=role
)
- Create Endpoint Configuration
endpoint_config_name = 'ml-endpoint-config'
sagemaker.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[
{
'VariantName': 'AllTraffic',
'ModelName': model_name,
'InstanceType': 'ml.m5.large',
'InitialInstanceCount': 1
}
]
)
- Deploy the Endpoint
endpoint_name = 'ml-endpoint'
sagemaker.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name
)
Exposing the Endpoint via API Gateway
- Create a Lambda Function that proxies requests to SageMaker:
import boto3
import json
runtime = boto3.client('sagemaker-runtime')
def lambda_handler(event, context):
body = json.loads(event['body'])
response = runtime.invoke_endpoint(
EndpointName='ml-endpoint',
ContentType='application/json',
Body=json.dumps(body)
)
result = json.loads(response['Body'].read())
return {
'statusCode': 200,
'body': json.dumps(result)
}
- Create API Gateway REST API
Integrate the POST method with the Lambda function.
Enable CORS.
Deploy to a new stage (e.g.,
/prod
).
Your endpoint will now be publicly accessible, like:
POST https://abc123.execute-api.us-east-1.amazonaws.com/prod/predict
Security and IAM Considerations
Use IAM roles to grant least-privilege access to SageMaker and S3.
Enable VPC endpoints for secure internal communication.
Use API Gateway usage plans and API keys to throttle access.
Enable CloudTrail logging for monitoring access.
Monitoring and Logging
CloudWatch Logs: Logs from SageMaker and Lambda can be viewed here.
SageMaker Metrics: Monitor latency, error rates, CPU usage, etc.
Enable Model Monitor to detect data drift and anomalies in real-time predictions.
Conclusion
Deploying machine learning models should be as reproducible and scalable as deploying application code. With AWS SageMaker, Docker, and API Gateway, DevOps engineers and data scientists can work together to streamline ML deployment pipelines.
By containerizing the model server, automating deployment with SageMaker, and exposing a secure REST API via API Gateway, your models are now production-ready, scalable, and observable. As you continue to iterate on your ML solutions, consider integrating CI/CD pipelines with CodePipeline and CodeBuild for automated testing and deployment.
Want to take it a step further? Try automating the entire process with Terraform or AWS CDK. And don’t forget to monitor and retrain your models regularly to ensure continued accuracy and relevance.
Subscribe to my newsletter
Read articles from The DevOps Dojo directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
