Machine learning (ML) is reshaping industries, and organizations are racing to integrate ML capabilities into their products. However, transitioning from a trained model in a Jupyter notebook to a scalable, production-ready deployment is a significant challenge. This is where AWS SageMaker comes into play. In this article, we will walk through the process of deploying a machine learning model on AWS SageMaker and serving it via a REST API using Amazon API Gateway and a Docker container. This guide is especially tailored for DevOps professionals looking to streamline ML model deployment in a production environment.

Introduction to AWS SageMaker

AWS SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly. It supports a wide range of ML frameworks and comes with built-in algorithms, Jupyter notebooks, and integration capabilities with other AWS services.

SageMaker provides several deployment options:

Hosted endpoints for real-time predictions.
Batch Transform for offline predictions.
Multi-Model Endpoints to deploy multiple models using the same infrastructure.

In this guide, we focus on deploying a model to a real-time hosted endpoint.

Overview of the Deployment Workflow

Here is a high-level overview of what we will accomplish:

Train and export your ML model (e.g., scikit-learn, TensorFlow, PyTorch).
Build a Docker container to serve the model using a predictor.py and a Dockerfile.
Push the Docker container to Amazon ECR.
Deploy the container as a SageMaker endpoint.
Create a RESTful API using Amazon API Gateway.
Secure and monitor the deployment.

Preparing the Model

Before we deploy, ensure you have a trained model serialized to a file format your inference code can read (e.g., .pkl, .joblib, .h5). Here's a simple example using scikit-learn:

# train_model.py
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
model = RandomForestClassifier()
model.fit(X, y)

joblib.dump(model, 'model.joblib')

Upload the model.joblib file to an S3 bucket:

aws s3 cp model.joblib s3://your-bucket-name/model/model.joblib

Building a Docker Container for SageMaker

Create a project structure like this:

ml-docker/
├── Dockerfile
├── predictor.py
├── serve
├── model.joblib (optional, can be loaded from S3)
└── requirements.txt

`predictor.py`

This script loads the model and handles prediction logic.

# predictor.py
import os
import joblib
import json
from sklearn.ensemble import RandomForestClassifier

MODEL_PATH = "/opt/ml/model/model.joblib"
model = joblib.load(MODEL_PATH)

def predict(input_data):
    input_json = json.loads(input_data)
    prediction = model.predict([input_json["features"]])
    return {"prediction": prediction.tolist()}

`serve` script

This script connects with SageMaker runtime and serves HTTP requests.

# serve
#!/usr/bin/env python3
from flask import Flask, request, jsonify
from predictor import predict

app = Flask(__name__)

@app.route("/ping", methods=["GET"])
def ping():
    return "pong", 200

@app.route("/invocations", methods=["POST"])
def invocations():
    return jsonify(predict(request.data))

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)

`requirements.txt`

flask
scikit-learn
joblib

`Dockerfile`

FROM python:3.9

WORKDIR /app

COPY . /app
RUN pip install -r requirements.txt

ENV MODEL_PATH="/opt/ml/model/model.joblib"

EXPOSE 8080

CMD ["python", "serve"]

Pushing to Amazon ECR

Authenticate Docker with ECR:

aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com

Create a new ECR repository:

aws ecr create-repository --repository-name ml-deploy-demo

Build and push the Docker image:

docker build -t ml-deploy-demo .
docker tag ml-deploy-demo:latest <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/ml-deploy-demo:latest
docker push <aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/ml-deploy-demo:latest

Creating a SageMaker Model Endpoint

Create a SageMaker Model

import boto3

sagemaker = boto3.client('sagemaker')

model_name = 'ml-deploy-model'
role = 'arn:aws:iam::<aws_account_id>:role/SageMakerExecutionRole'
image_uri = '<aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/ml-deploy-demo:latest'

sagemaker.create_model(
    ModelName=model_name,
    PrimaryContainer={
        'Image': image_uri,
    },
    ExecutionRoleArn=role
)

Create Endpoint Configuration

endpoint_config_name = 'ml-endpoint-config'

sagemaker.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            'VariantName': 'AllTraffic',
            'ModelName': model_name,
            'InstanceType': 'ml.m5.large',
            'InitialInstanceCount': 1
        }
    ]
)

Deploy the Endpoint

endpoint_name = 'ml-endpoint'

sagemaker.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

Exposing the Endpoint via API Gateway

Create a Lambda Function that proxies requests to SageMaker:

import boto3
import json

runtime = boto3.client('sagemaker-runtime')

def lambda_handler(event, context):
    body = json.loads(event['body'])
    response = runtime.invoke_endpoint(
        EndpointName='ml-endpoint',
        ContentType='application/json',
        Body=json.dumps(body)
    )
    result = json.loads(response['Body'].read())
    return {
        'statusCode': 200,
        'body': json.dumps(result)
    }

Create API Gateway REST API

Integrate the POST method with the Lambda function.
Enable CORS.
Deploy to a new stage (e.g., /prod).

Your endpoint will now be publicly accessible, like:

POST https://abc123.execute-api.us-east-1.amazonaws.com/prod/predict

Security and IAM Considerations

Use IAM roles to grant least-privilege access to SageMaker and S3.
Enable VPC endpoints for secure internal communication.
Use API Gateway usage plans and API keys to throttle access.
Enable CloudTrail logging for monitoring access.

Monitoring and Logging

CloudWatch Logs: Logs from SageMaker and Lambda can be viewed here.
SageMaker Metrics: Monitor latency, error rates, CPU usage, etc.
Enable Model Monitor to detect data drift and anomalies in real-time predictions.

Conclusion

Deploying machine learning models should be as reproducible and scalable as deploying application code. With AWS SageMaker, Docker, and API Gateway, DevOps engineers and data scientists can work together to streamline ML deployment pipelines.

By containerizing the model server, automating deployment with SageMaker, and exposing a secure REST API via API Gateway, your models are now production-ready, scalable, and observable. As you continue to iterate on your ML solutions, consider integrating CI/CD pipelines with CodePipeline and CodeBuild for automated testing and deployment.

Want to take it a step further? Try automating the entire process with Terraform or AWS CDK. And don’t forget to monitor and retrain your models regularly to ensure continued accuracy and relevance.

Deploying a Machine Learning Model on AWS with SageMaker: A DevOps Guide

Introduction to AWS SageMaker

Overview of the Deployment Workflow

Preparing the Model

Building a Docker Container for SageMaker

`predictor.py`

`serve` script

`requirements.txt`

`Dockerfile`

Pushing to Amazon ECR

Creating a SageMaker Model Endpoint

Exposing the Endpoint via API Gateway

Security and IAM Considerations

Monitoring and Logging

Conclusion

Subscribe to my newsletter

The DevOps Dojo

The DevOps Dojo

Deploying a Machine Learning Model on AWS with SageMaker: A DevOps Guide

Introduction to AWS SageMaker

Overview of the Deployment Workflow

Preparing the Model

Building a Docker Container for SageMaker

predictor.py

serve script

requirements.txt

Dockerfile

Pushing to Amazon ECR

Creating a SageMaker Model Endpoint

Exposing the Endpoint via API Gateway

Security and IAM Considerations

Monitoring and Logging

Conclusion

Subscribe to my newsletter

The DevOps Dojo

The DevOps Dojo

`predictor.py`

`serve` script

`requirements.txt`

`Dockerfile`