Deploying a Serverless Machine Learning Model on AWS SageMaker
In this article, we will walk through the process of deploying a machine learning model using AWS SageMaker in a serverless manner. This guide serves as a prerequisite to my previous article on building a machine learning model with AWS SageMaker. If you haven't read that yet, I recommend checking it out for a better understanding of the foundational concepts.
Prerequisites
Before we dive into the deployment process, ensure you have:
Completed the previous article on building a machine learning model with AWS SageMaker.
An AWS account with permissions to access SageMaker and S3.
The Boto3 library installed in your Jupyter Notebook environment.
Initialize Boto3 Clients
First, we need to initialize the Boto3 clients for SageMaker and SageMaker Runtime. This allows us to interact with the SageMaker service programmatically.
import boto3
client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")
Retrieve Model Artifacts
Next, we retrieve the model artifacts from our trained XGBoost model. These artifacts include the model's weights and configurations, which are essential for making predictions.
# Access the model artifacts (e.g., model weights and configurations)
model_artifacts = xgb.model_data # This contains the S3 path to the model artifacts
print(model_artifacts)
Create and Register a Serverless Model
Now, we will create a new serverless model in Amazon SageMaker. We specify the model name, container image, and the model artifacts stored in S3.
from time import gmtime, strftime
# Generate a unique model name based on the current time
model_name = "xgboost-serverless" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Model name: " + model_name)
# Define environment variables for the container
byo_container_env_vars = {
"SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
"SOME_ENV_VAR": "myEnvVar"
}
# Create the model in SageMaker
create_model_response = client.create_model(
ModelName=model_name,
Containers=[{
"Image": container,
"Mode": "SingleModel",
"ModelDataUrl": model_artifacts,
"Environment": byo_container_env_vars,
}],
ExecutionRoleArn=role,
)
print("Model Arn: " + create_model_response["ModelArn"])
Create an Endpoint Configuration
Next, we create an endpoint configuration for our serverless deployment. This configuration specifies how our model will be deployed.
# Generate a unique endpoint configuration name
xgboost_epc_name = "mlops-serverless-epc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
# Create an endpoint configuration for the XGBoost model
endpoint_config_response = client.create_endpoint_config(
EndpointConfigName=xgboost_epc_name,
ProductionVariants=[{
"VariantName": "byoVariant",
"ModelName": model_name,
"ServerlessConfig": {
"MemorySizeInMB": 3072,
"MaxConcurrency": 1,
},
}],
)
print("Endpoint Configuration Arn: " + endpoint_config_response["EndpointConfigArn"])
Create a Serverless Endpoint
With our endpoint configuration ready, we can now create a serverless endpoint that will serve our XGBoost model.
# Generate a unique endpoint name
endpoint_name = "xgboost-serverless-ep" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
# Create an endpoint using the previously created endpoint configuration
create_endpoint_response = client.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=xgboost_epc_name,
)
print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])
Monitor Endpoint Creation Status
We need to monitor the creation status of our SageMaker endpoint until it is ready to serve predictions.
import time
# Describe the endpoint to check its creation status
describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
while describe_endpoint_response["EndpointStatus"] == "Creating":
describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
print(describe_endpoint_response["EndpointStatus"])
time.sleep(15) # Wait for 15 seconds before checking again
print("Final Endpoint Status:", describe_endpoint_response)
Invoke the Endpoint for Real-Time Inference
Once our endpoint is ready, we can invoke it to get real-time predictions.
# Define a payload for prediction request (CSV format)
payload = b"3.,999.,0.,1.,0.,0.,0.,0.,0.,0.,0.,1.,0.,0.,0.,0.,0.,1.,0.,0."
# Invoke the SageMaker endpoint for real-time inference
response = runtime.invoke_endpoint(
EndpointName=endpoint_name,
Body=payload,
ContentType="text/csv",
)
# Read and decode prediction result from response
print(response["Body"].read().decode())
Clean Up Resources
Finally, it's important to clean up resources by deleting the created models and endpoints to avoid unnecessary charges.
client.delete_model(ModelName=model_name)
client.delete_endpoint_config(EndpointConfigName=xgboost_epc_name)
client.delete_endpoint(EndpointName=endpoint_name)
Conclusion
In this article, we've successfully deployed an XGBoost machine learning model using AWS SageMaker in a serverless architecture. This approach allows you to scale your applications efficiently without managing servers. For further details and code examples, feel free to explore my GitHub repository here. If you have any questions or need further clarification on any steps, feel free to reach out!
Subscribe to my newsletter
Read articles from Anshul Garg directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by