Deploy Serverless Model on AWS SageMaker

In this article, we will walk through the process of deploying a machine learning model using AWS SageMaker in a serverless manner. This guide serves as a prerequisite to my previous article on building a machine learning model with AWS SageMaker. If you haven't read that yet, I recommend checking it out for a better understanding of the foundational concepts.

Prerequisites

Before we dive into the deployment process, ensure you have:

Completed the previous article on building a machine learning model with AWS SageMaker.
An AWS account with permissions to access SageMaker and S3.
The Boto3 library installed in your Jupyter Notebook environment.

Initialize Boto3 Clients

First, we need to initialize the Boto3 clients for SageMaker and SageMaker Runtime. This allows us to interact with the SageMaker service programmatically.

import boto3

client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")

Retrieve Model Artifacts

Next, we retrieve the model artifacts from our trained XGBoost model. These artifacts include the model's weights and configurations, which are essential for making predictions.

# Access the model artifacts (e.g., model weights and configurations)
model_artifacts = xgb.model_data  # This contains the S3 path to the model artifacts
print(model_artifacts)

Create and Register a Serverless Model

Now, we will create a new serverless model in Amazon SageMaker. We specify the model name, container image, and the model artifacts stored in S3.

from time import gmtime, strftime

# Generate a unique model name based on the current time
model_name = "xgboost-serverless" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Model name: " + model_name)

# Define environment variables for the container
byo_container_env_vars = {
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SOME_ENV_VAR": "myEnvVar"
}

# Create the model in SageMaker
create_model_response = client.create_model(
    ModelName=model_name,
    Containers=[{
        "Image": container,
        "Mode": "SingleModel",
        "ModelDataUrl": model_artifacts,
        "Environment": byo_container_env_vars,
    }],
    ExecutionRoleArn=role,
)

print("Model Arn: " + create_model_response["ModelArn"])

Create an Endpoint Configuration

Next, we create an endpoint configuration for our serverless deployment. This configuration specifies how our model will be deployed.

# Generate a unique endpoint configuration name
xgboost_epc_name = "mlops-serverless-epc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

# Create an endpoint configuration for the XGBoost model
endpoint_config_response = client.create_endpoint_config(
    EndpointConfigName=xgboost_epc_name,
    ProductionVariants=[{
        "VariantName": "byoVariant",
        "ModelName": model_name,
        "ServerlessConfig": {
            "MemorySizeInMB": 3072,
            "MaxConcurrency": 1,
        },
    }],
)

print("Endpoint Configuration Arn: " + endpoint_config_response["EndpointConfigArn"])

Create a Serverless Endpoint

With our endpoint configuration ready, we can now create a serverless endpoint that will serve our XGBoost model.

# Generate a unique endpoint name
endpoint_name = "xgboost-serverless-ep" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

# Create an endpoint using the previously created endpoint configuration
create_endpoint_response = client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=xgboost_epc_name,
)

print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

Monitor Endpoint Creation Status

We need to monitor the creation status of our SageMaker endpoint until it is ready to serve predictions.

import time

# Describe the endpoint to check its creation status
describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)

while describe_endpoint_response["EndpointStatus"] == "Creating":
    describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
    print(describe_endpoint_response["EndpointStatus"])
    time.sleep(15)  # Wait for 15 seconds before checking again

print("Final Endpoint Status:", describe_endpoint_response)

Invoke the Endpoint for Real-Time Inference

Once our endpoint is ready, we can invoke it to get real-time predictions.

# Define a payload for prediction request (CSV format)
payload = b"3.,999.,0.,1.,0.,0.,0.,0.,0.,0.,0.,1.,0.,0.,0.,0.,0.,1.,0.,0."

# Invoke the SageMaker endpoint for real-time inference
response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=payload,
    ContentType="text/csv",
)

# Read and decode prediction result from response
print(response["Body"].read().decode())

Clean Up Resources

Finally, it's important to clean up resources by deleting the created models and endpoints to avoid unnecessary charges.

client.delete_model(ModelName=model_name)
client.delete_endpoint_config(EndpointConfigName=xgboost_epc_name)
client.delete_endpoint(EndpointName=endpoint_name)

Conclusion

In this article, we've successfully deployed an XGBoost machine learning model using AWS SageMaker in a serverless architecture. This approach allows you to scale your applications efficiently without managing servers. For further details and code examples, feel free to explore my GitHub repository here. If you have any questions or need further clarification on any steps, feel free to reach out!

Deploying a Serverless Machine Learning Model on AWS SageMaker