Building Your Feature Store with AWS SageMaker: A Step-by-Step Guide
AWS SageMaker Feature Store helps you manage, organize, and access your machine learning features in a centralized and efficient way. In this guide, you’ll learn how to create, store, and retrieve features with SageMaker Feature Store.
Access the Notebook
If you'd like to run the code directly, visit the GitHub repository for access to the Jupyter notebook here.
Prerequisites
AWS account
SageMaker Studio instance
Basic familiarity with Python and SageMaker
Setting Up
Initialize the SageMaker Session and Role
Import necessary libraries and initialize the SageMaker session and role, which gives you access to AWS resources.import sagemaker session = sagemaker.Session() role = sagemaker.get_execution_role()
Set Up the Feature Store Client
The Feature Store client handles creating and managing feature groups.from sagemaker.feature_store.feature_group import FeatureGroup import boto3 featurestore_client = boto3.client("sagemaker-featurestore-runtime")
Creating Feature Groups
A Feature Group is a logical grouping of features, stored in a table-like structure. We’ll create two feature groups, customers
and orders
, each with a unique identifier.
Define the Schema
Define the features you want to include in each group, ensuring you have a unique record identifier (e.g.,customer_id
for customers) and an event timestamp.Create Customer Feature Group
When creating the customer feature group, specify
"customer_id"
as the record identifier, which uniquely identifies each record. Also, specify theevent_time
feature to track when the record was added.customer_feature_group = FeatureGroup(name="customer_feature_group", sagemaker_session=session) customer_feature_group.create( record_identifier_name="customer_id", event_time_feature_name="event_time", role_arn=role, enable_online_store=True )
Create Order Feature Group
Similarly, define and create the orders feature group with
order_id
as the record identifier.order_feature_group = FeatureGroup(name="order_feature_group", sagemaker_session=session) order_feature_group.create( record_identifier_name="order_id", event_time_feature_name="event_time", role_arn=role, enable_online_store=True )
Ingesting Data into Feature Groups
Load your data into these feature groups, which will store it for both online and offline use cases.
Prepare the Data
Assume you have data for customers and orders in a DataFrame. Ensure all columns match the schema you defined earlier.customer_data = [ {"customer_id": "C123", "event_time": "2023-11-01T00:00:00Z", "name": "Alice", "age": 30}, {"customer_id": "C124", "event_time": "2023-11-01T01:00:00Z", "name": "Bob", "age": 25} ]
Ingest Data into Feature Groups
Use the
put_record
method to add each entry in your DataFrame into the feature store.for record in customer_data: featurestore_client.put_record( FeatureGroupName="customer_feature_group", Record=[record] )
Querying the Feature Store
Retrieve features based on your requirements.
Fetch Customer Data by ID
response = featurestore_client.get_record( FeatureGroupName="customer_feature_group", RecordIdentifierValueAsString="C123" ) customer_features = response['Record']
This will return the customer features associated with
customer_id
"C123".Fetch Multiple Customer Data by ID using Batch process
all_records = sagemaker_session.boto_session.client( "sagemaker-featurestore-runtime", region_name=region ).batch_get_record( Identifiers=[ { "FeatureGroupName": customers_feature_group_name, # Name of the customer feature group "RecordIdentifiersValueAsString": ["C400", "C401"], # Customer IDs to retrieve }, { "FeatureGroupName": orders_feature_group_name, # Name of the order feature group "RecordIdentifiersValueAsString": ["C400", "C401"], # Order IDs to retrieve }, ] ) all_records
Wrapping Up
AWS SageMaker Feature Store offers a powerful way to centralize and manage your ML features, making model development and production easier. Try creating your own feature groups and experimenting with different datasets to get familiar with the full capabilities of Feature Store.
Subscribe to my newsletter
Read articles from Anshul Garg directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by