Building Your Feature Store with AWS SageMaker: A Step-by-Step Guide

Anshul GargAnshul Garg
3 min read

AWS SageMaker Feature Store helps you manage, organize, and access your machine learning features in a centralized and efficient way. In this guide, you’ll learn how to create, store, and retrieve features with SageMaker Feature Store.

Access the Notebook
If you'd like to run the code directly, visit the GitHub repository for access to the Jupyter notebook here.


Prerequisites

  • AWS account

  • SageMaker Studio instance

  • Basic familiarity with Python and SageMaker

Setting Up

  1. Initialize the SageMaker Session and Role
    Import necessary libraries and initialize the SageMaker session and role, which gives you access to AWS resources.

     import sagemaker
     session = sagemaker.Session()
     role = sagemaker.get_execution_role()
    
  2. Set Up the Feature Store Client
    The Feature Store client handles creating and managing feature groups.

     from sagemaker.feature_store.feature_group import FeatureGroup
     import boto3
    
     featurestore_client = boto3.client("sagemaker-featurestore-runtime")
    

Creating Feature Groups

A Feature Group is a logical grouping of features, stored in a table-like structure. We’ll create two feature groups, customers and orders, each with a unique identifier.

  1. Define the Schema
    Define the features you want to include in each group, ensuring you have a unique record identifier (e.g., customer_id for customers) and an event timestamp.

  2. Create Customer Feature Group

    When creating the customer feature group, specify "customer_id" as the record identifier, which uniquely identifies each record. Also, specify the event_time feature to track when the record was added.

     customer_feature_group = FeatureGroup(name="customer_feature_group", sagemaker_session=session)
     customer_feature_group.create(
         record_identifier_name="customer_id",
         event_time_feature_name="event_time",
         role_arn=role,
         enable_online_store=True
     )
    
  3. Create Order Feature Group

    Similarly, define and create the orders feature group with order_id as the record identifier.

     order_feature_group = FeatureGroup(name="order_feature_group", sagemaker_session=session)
     order_feature_group.create(
         record_identifier_name="order_id",
         event_time_feature_name="event_time",
         role_arn=role,
         enable_online_store=True
     )
    

Ingesting Data into Feature Groups

Load your data into these feature groups, which will store it for both online and offline use cases.

  1. Prepare the Data
    Assume you have data for customers and orders in a DataFrame. Ensure all columns match the schema you defined earlier.

     customer_data = [
         {"customer_id": "C123", "event_time": "2023-11-01T00:00:00Z", "name": "Alice", "age": 30},
         {"customer_id": "C124", "event_time": "2023-11-01T01:00:00Z", "name": "Bob", "age": 25}
     ]
    
  2. Ingest Data into Feature Groups

    Use the put_record method to add each entry in your DataFrame into the feature store.

     for record in customer_data:
         featurestore_client.put_record(
             FeatureGroupName="customer_feature_group",
             Record=[record]
         )
    

Querying the Feature Store

Retrieve features based on your requirements.

  1. Fetch Customer Data by ID

     response = featurestore_client.get_record(
         FeatureGroupName="customer_feature_group",
         RecordIdentifierValueAsString="C123"
     )
     customer_features = response['Record']
    

    This will return the customer features associated with customer_id "C123".

  2. Fetch Multiple Customer Data by ID using Batch process

     all_records = sagemaker_session.boto_session.client(
         "sagemaker-featurestore-runtime", region_name=region
     ).batch_get_record(
         Identifiers=[
             {
                 "FeatureGroupName": customers_feature_group_name,  # Name of the customer feature group
                 "RecordIdentifiersValueAsString": ["C400", "C401"],  # Customer IDs to retrieve
             },
             {
                 "FeatureGroupName": orders_feature_group_name,  # Name of the order feature group
                 "RecordIdentifiersValueAsString": ["C400", "C401"],  # Order IDs to retrieve
             },
         ]
     )
     all_records
    

Wrapping Up

AWS SageMaker Feature Store offers a powerful way to centralize and manage your ML features, making model development and production easier. Try creating your own feature groups and experimenting with different datasets to get familiar with the full capabilities of Feature Store.

0
Subscribe to my newsletter

Read articles from Anshul Garg directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Anshul Garg
Anshul Garg