Postgres vs DynamoDB: Data Modeling Insights

Data modeling is the blueprint for how your data is structured, stored, and accessed. The choice between a relational database like Postgres and a NoSQL database like DynamoDB often hinges on different data modeling philosophies and application requirements. In this post, we’ll explore the principles, benefits, and challenges of data modeling in these two databases, helping you make informed decisions for your applications.

Relationships

Relationships define how data entities interact with each other. In databases, these relationships can be one-to-one, one-to-many, or many-to-many.

How Postgres Handles Relationships:

Postgres, being a relational database, excels at handling various types of relationships using foreign keys and join operations.

One-to-One Relationships:

In this example, each user has one unique profile. The user_id in the Profiles table uniquely references the user_id in the Users table.

 CREATE TABLE Users (
     user_id SERIAL PRIMARY KEY,
     username VARCHAR(50) NOT NULL UNIQUE,
     email VARCHAR(100) NOT NULL UNIQUE,
     created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
 );

 CREATE TABLE Profiles (
     profile_id SERIAL PRIMARY KEY,
     user_id INT UNIQUE REFERENCES Users(user_id),
     bio TEXT,
     profile_picture VARCHAR(255)
 );

One-to-Many Relationships:

Here, one user can have multiple orders, establishing a one-to-many relationship between Users and Orders.

 CREATE TABLE Orders (
     order_id SERIAL PRIMARY KEY,
     user_id INT REFERENCES Users(user_id),
     order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
     total NUMERIC(10, 2) NOT NULL
 );

Many-to-Many Relationships:

This setup allows users to belong to multiple groups and groups to have multiple users, thus establishing a many-to-many relationship via the UserGroups join table.

 CREATE TABLE Groups (
     group_id SERIAL PRIMARY KEY,
     group_name VARCHAR(100) NOT NULL UNIQUE
 );

 CREATE TABLE UserGroups (
     user_id INT REFERENCES Users(user_id),
     group_id INT REFERENCES Groups(group_id),
     PRIMARY KEY (user_id, group_id)
 );

How DynamoDB Handles Relationships:

DynamoDB generally uses denormalization and embedding to manage relationships, avoiding the need for joins.

One-to-One Relationships:

In this example, user profile information is embedded directly within the user item.

 import boto3

 dynamodb = boto3.resource('dynamodb')

 # Create a table for Users with embedded profile information
 users_table = dynamodb.create_table(
     TableName='Users',
     KeySchema=[
         {'AttributeName': 'user_id', 'KeyType': 'HASH'}  # Partition key
     ],
     AttributeDefinitions=[
         {'AttributeName': 'user_id', 'AttributeType': 'S'}
     ],
     ProvisionedThroughput={
         'ReadCapacityUnits': 5,
         'WriteCapacityUnits': 5
     }
 )

 # Insert data with embedded profile
 users_table.put_item(
     Item={
         'user_id': '1',
         'username': 'JohnDoe',
         'email': 'john.doe@example.com',
         'profile': {
             'bio': 'Software Developer',
             'profile_picture': 'path/to/picture.jpg'
         }
     }
 )

One-to-Many Relationships:

Each order belongs to a single user, and order details are embedded within the order item.

 orders_table = dynamodb.create_table(
     TableName='Orders',
     KeySchema=[
         {'AttributeName': 'user_id', 'KeyType': 'HASH'},  # Partition key
         {'AttributeName': 'order_id', 'KeyType': 'RANGE'}  # Sort key
     ],
     AttributeDefinitions=[
         {'AttributeName': 'user_id', 'AttributeType': 'S'},
         {'AttributeName': 'order_id', 'AttributeType': 'S'}
     ],
     ProvisionedThroughput={
         'ReadCapacityUnits': 5,
         'WriteCapacityUnits': 5
     }
 )

 # Insert data example
 orders_table.put_item(
     Item={
         'user_id': '1',
         'order_id': '1001',
         'order_date': '2024-08-05T12:00:00Z',
         'total': 150.00,
         'items': [
             {'product_id': '101', 'name': 'Product A', 'quantity': 2, 'price': 50.00},
             {'product_id': '102', 'name': 'Product B', 'quantity': 1, 'price': 50.00}
         ]
     }
 )

Many-to-Many Relationships:

DynamoDB does not natively support many-to-many relationships in the same way as relational databases. Instead, it uses techniques like item collections and secondary indexes to model these relationships.

To model many-to-many relationships, such as users and groups, we might use a combination of tables and secondary indexes.

# Users table
users_table = dynamodb.create_table(
    TableName='Users',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'}  # Partition key
    ],
    AttributeDefinitions=[
        {'AttributeName': 'user_id', 'AttributeType': 'S'}
    ],
    ProvisionedThroughput={
        'ReadCapacityUnits': 5,
        'WriteCapacityUnits': 5
    }
)

# Groups table
groups_table = dynamodb.create_table(
    TableName='Groups',
    KeySchema=[
        {'AttributeName': 'group_id', 'KeyType': 'HASH'}  # Partition key
    ],
    AttributeDefinitions=[
        {'AttributeName': 'group_id', 'AttributeType': 'S'}
    ],
    ProvisionedThroughput={
        'ReadCapacityUnits': 5,
        'WriteCapacityUnits': 5
    }
)

# UserGroups table to model many-to-many relationship
user_groups_table = dynamodb.create_table(
    TableName='UserGroups',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'},  # Partition key
        {'AttributeName': 'group_id', 'KeyType': 'RANGE'}  # Sort key
    ],
    AttributeDefinitions=[
        {'AttributeName': 'user_id', 'AttributeType': 'S'},
        {'AttributeName': 'group_id', 'AttributeType': 'S'}
    ],
    ProvisionedThroughput={
        'ReadCapacityUnits': 5,
        'WriteCapacityUnits': 5
    }
)

Consequences of differences in handling relationships

Postgres:

One-to-One: Ensures data integrity and constraints easily with foreign keys. Efficient for data retrieval with SQL joins.
One-to-Many: Postgres works well with its usual capabilities.
Many-to-Many: Native support with join tables, ensuring data integrity and flexibility.

DynamoDB:

One-to-One: Embedding simplifies data retrieval and can be faster with data locality if your app needs to access the entire document. But it can lead to data redundancy which means slower writes and risk of inconsistencies. Separation requires additional queries since joins are not available.
One-to-Many: DynamoDB’s approach to one-to-many relationships often mirrors how data is structured in application code, using nested attributes and denormalized data models. This alignment reduces the need for complex data transformations and simplifies development, making it more intuitive for developers. However, it can lead to challenges like write amplification and scalability issues, especially in write-heavy scenarios and hence needs careful design.
Many-to-Many: More complex to model, requires additional design considerations such as secondary indexes and item collections.

Schema Design

The schema design defines the structure, constraints, and relationships between data entities. Both Postgres and DynamoDB handle schema design differently, each with its own advantages and trade-offs.

Schema Design in Postgres

Postgres relies on a well-defined, rigid schema.

Structured Schema: In Postgres, tables are created with a predefined schema that includes the data types for each column, constraints, and relationships. This structured schema enforces data integrity and ensures that all data adheres to the defined format.

In this example, the Users table is created with specific columns (user_id, username, and email), and constraints like PRIMARY KEY and UNIQUE are enforced. This ensures that each user has a unique ID and email, maintaining the integrity of the data.
```
  CREATE TABLE Users (
      user_id SERIAL PRIMARY KEY,
      username VARCHAR(50) NOT NULL,
      email VARCHAR(100) UNIQUE NOT NULL
  );
```
Strong Typing and Constraints: Postgres enforces strong typing, meaning that each column in a table must contain data of the specified type. Constraints like NOT NULL, CHECK, and FOREIGN KEY further enforce the rules about what data can be stored.

This constraint ensures that the email column follows a basic format, further enhancing data validity.
```
  ALTER TABLE Users ADD CONSTRAINT email_format CHECK (email LIKE '%_@__%.__%');
```

Schema Design in DynamoDB

DynamoDB takes a flexible, schema-less approach.

Schema-Less Flexibility: Unlike Postgres, DynamoDB doesn’t require a predefined schema for all attributes. Instead, it only enforces the schema for the primary key. This allows for great flexibility in storing different types of data within the same table.

In this example, both items are stored in the same Users table but have different attributes. DynamoDB doesn’t enforce a fixed schema, allowing each item to have its own set of attributes.
```
  {
      'user_id': '1',
      'username': 'Alice',
      'email': 'alice@example.com',
      'age': 30
  }
  {
      'user_id': '2',
      'username': 'Bob',
      'preferences': {'theme': 'dark', 'notifications': 'enabled'}
  }
```

Consequences of Postgres Schema Design:

Pros:
- Data Integrity: The rigid schema and strong typing ensure that data remains consistent and adheres to predefined rules.
- Query Optimization: The defined schema allows Postgres to optimize queries, improving performance for complex operations.
Cons:
- Complex Migrations: Altering the schema, especially in production environments with large datasets, can be complex and time-consuming.

Consequences of DynamoDB Schema Design:

Pros:
- Flexibility: DynamoDB’s schema-less design makes it easy to add new attributes without needing to modify the schema or migrate existing data. This is particularly useful in agile development environments where requirements can change frequently.
- Simplified Development: Developers can focus on the data they need to store without worrying about enforcing a strict schema.
- Scalability: DynamoDB’s flexible schema is well-suited for horizontally scaling applications where different entities might have different data needs.
Cons:
- Potential for Inconsistent Data: Without enforced constraints, there’s a risk of storing inconsistent or incomplete data, which can lead to data quality issues.
- Complex Querying: DynamoDB is often described as "schema on read," meaning that while the database doesn't enforce a strict schema when data is written, the application must define how to interpret and process that data when it's read. This requires developers to carefully manage and structure their queries, as they must account for the different possible structures of the data.

Conclusion

In summary, use Postgres for strong data integrity and complex relationships. Opt for DynamoDB when you need flexibility and scalability, with careful consideration of how to handle complex relationships and queries.

Choosing the Right Database: Data modeling in Postgres vs DynamoDB