Optimizing Data Retrieval from DynamoDB Using Boto3

Alon ShresthaAlon Shrestha
4 min read

While my expertise isn’t in databases, but whenever I have to choose a database for my internal projects, I go with NoSQL solutions like DynamoDB in AWS.

Recently, I built a project for real clients, which meant performance and efficiency became critical, especially for data retrieval.

I explored DynamoDB’s documentation, ran some performance tests, and made a few small but impactful changes. These changes led to a noticeable boost in data retrieval speed.

In this blog, I’ll share what I learned and how I implemented those improvements.

DynamoDB: Choosing the Right Method for Data Retrieval

I use the Python Boto3 library to access DynamoDB, which provides three main methods to retrieve data. GetItem, Query and Scan. However, AWS strongly recommends avoiding the use of Scan whenever possible, as it scans the entire table and is the least efficient option for data retrieval.

When it comes to choosing between GetItem and Query, it largely depends on your specific use case. Here’s a quick breakdown:

  • Use GetItem: If you know the exact partition key (and sort key, if the table has one) and need to retrieve a single item.
clientName(Partition Key)clientLocation
Client_JunKathmandu, Nepal
Client_GoleDhulikhel, Nepal
  • In this case, if you know the clientName (e.g., Client_Jun), you can use GetItem to retrieve that one specific record.

  • 📝 Note: If your table has a sort key, you must provide both the partition key and the sort key when using GetItem.

  • Use Query: If you need to retrieve multiple items that share the same partition key (and sort key, if applicable).

clientName(Partition Key)clientId(Sort Key)clientLocation
Client_Jun1010Kathmandu, Nepal
Client_Jun1011Dhulikhel, Nepal
Client_Jun1012Bhaktapur, Nepal
  • In this case, you can use Query to retrieve all entries where clientName = Client_Jun.

  • 📝 Note: You can query just by partition key or add conditions on the sort key to filter the results.

My Experience: From Query to GetItem

In my case, I was initially using Query, but after reviewing the requirements, I realized that GetItem was a better fit since I only ever had one item per partition key. So, I updated my table structure by removing the sort key and modified my code to use GetItem instead.

This change made a big impact because GetItem is optimized for retrieving a single item using the partition key, which makes it much faster. Even with this improvement, I still saw opportunities to optimize further.

Two Ways to Use GetItem

Boto3 provides two ways to interact with DynamoDB’s GetItem API:

  1. Boto3 Low-Level Client API

     client = boto3.client('dynamodb')
    
  2. Boto3 High-Level Resource API

     dynamodb = boto3.resource('dynamodb')
    

I was using the high-level Resource API, which is easier to work with because it handles many complexities for you.

Example: Retrieving Data Using Both APIs

Here’s how you can retrieve data using both the Client API and Resource API with the same filter:

Key = {
    'ClientId': { 'N': '1001' },
    'ClientName': { 'S': 'Client_Jun' }
}
  • Output from the Client API:

      {
          'Item': {
              'ClientPosixProfile': {'M': {'Uid': {'N': '2008'}, 'SecondaryGids': {'L': [{'N': '2000'}]}, 'Gid': {'N': '2000'}}},
              'ClientUserName': {'S': 'Client_Jun'},
              'ClientHomeDirConfigMap': {'M': {'Entry': {'S': '/'}, 'Target': {'S': '/newclients/Client_Jun'}}},
              'ClientPassword': {'S': 'HahaHoHo@1234'}
          }
      }
    

    This returns raw data with types like S, N, M, etc., which you have to manually process and interpret.

  • Output from the Resource API:

      {
          'Item': {
              'ClientPosixProfile': {'Uid': Decimal('2008'), 'SecondaryGids': [Decimal('2000')], 'Gid': Decimal('2000')},
              'ClientUserName': 'Client_Jun',
              'ClientHomeDirConfigMap': {'Entry': '/', 'Target': '/newclients/Client_Jun'},
              'ClientPassword': 'HahaHoHo@1234'
          }
      }
    

    This returns clear, simplified data that’s easier to use directly, as it automatically converts data types for you.

While the resource API works well for most use cases, the Low-Level Client API proved to be faster and more efficient for data retrieval in my specific case. You can see the speed comparison in the graph below:

  • GetItem using the Resource API took 1.4 seconds

  • GetItem using the Client API took 1.2 seconds

  • Using Query took 2.6 seconds

Conclusion

After switching from Query to the GetItem API, I immediately noticed an improvement in performance. But here’s the interesting part: using the low-level Client API for GetItem resulted in even faster data retrieval.

While the Resource API is easier to use and handles the data retrieval in a more abstract way, it’s inherently slower due to the additional processing overhead. On the other hand, the Client API gives you raw data, which requires more manipulation, but provides a much faster response time.

Thanks for reading,

-Alon

0
Subscribe to my newsletter

Read articles from Alon Shrestha directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Alon Shrestha
Alon Shrestha

Hi, I’m Alon, the author of this page! With a background in Computer Science, I’m deeply passionate about exploring and building in the world of ☁️ cloud technology. Outside of tech, I enjoy doing music 🎸, traveling 🥾, and sometimes fitness 🏋️‍♂️. Recently, I discovered a love for writing, which inspired me to create this website as a space to share my interests, journey, projects, and insights along the way. Hope you enjoy your time here, and thanks so much for being here!