Introduction

In a world where data changes faster than ever, keeping track of these changes is no longer optional, it is essential for understanding trends, running accurate reports, and making informed decisions. When a customer changes their address, when a product gets rebranded, when an employee moves to another department, how do you reflect these changes in your data warehouse without losing valuable historical insights?

There are several data warehousing techniques to manage historical data such as audit tables, temporal tables, and data vault modelling. However, in this article, we will focus on one of the most widely adopted approaches: Slowly Changing Dimensions (SCDs).

In this article, we’ll explore:

What are SCDs and why they matter
The different types of SCDs
How to choose the right type for your specific use case

Sample Customer Data for SCD Examples

To illustrate the different SCD techniques, we'll use a simple CUSTOMER table as our example throughout this article.

The table stores basic information about our customers:

ID: A unique identifier for each customer.
NAME: The customer's name.
COUNTRY: The customer's country of residence.
EMAIL: The customer's e-mail address.

Here is a sample of the data:

ID	NAME	COUNTRY	EMAIL
1	Michael Scott	USA	michael.scott@dundermifflinpc.com
2	Dwight Schrute	USA	dwight.schrute@dundermifflinpc.com

Slowly Changing Dimension (SCD)

SCDs are a set of data warehousing techniques used to manage and track changes to data over time. They address the fact that certain attributes of an entity—like a product's production cost or a customer's address—can change over time, and those changes need to be recorded for accurate reporting and analysis.

Dimensions

In the context of SCDs, dimensions refer to descriptive attributes of an entity. In our CUSTOMER table, the fields NAME, COUNTRY, and EMAIL are all dimensions describing a single customer.

SCD Types

SCD Type 0 (SCD0): No Changes Allowed

SCD Type 0 does not allow changes to the original data once ingested. It is mainly used for static reference data that should never change (e.g. country codes). If a correction is required, it is handled through a formal exception processes rather than by automatically updating data.

SCD Type 1 (SCD1): Overwriting Existing Data

SCD Type 1 updates the current value of a field by overwriting it with the new value. No historical data is retained.

It is useful when:

Only the most recent version of a dimension attribute is needed.
Tracking historical changes is not required.

✅ Strengths

Simplicity: Easy to implement and maintain.
Performance: Requires less storage and results in faster queries, since there's no need to manage historical records.
Relevance: Ensures only current, actionable data is used.

⚠️ Weaknesses

No historical data tracking: Once data is overridden, the previous value is lost forever—unless a backup mechanism is in place.

📌 Example Scenario

Over time, customers may update their contact information—for instance, they might stop using an old email or switch to another account. In such cases, keeping only the most recent contact details is usually sufficient. Storing outdated information could lead to confusion or failed communication attempts if not handled properly.

Suppose Michael changes his email to: michael.g.scott@dundermifflinpc.com.

The updated table would look like this:

ID	NAME	COUNTRY	EMAIL
1	Michael Scott	USA	michael.g.scott@dundermifflinpc.com
2	Dwight Schrute	USA	dwight.schrute@dundermifflinpc.com

SCD Type 2 (SCD2): Adding a New Row

SCD Type 2 tracks historical data by adding a new row to the table for each update. The previous row is preserved, typically with an end date that marks when it stopped being valid. This method allows you to trace the evolution of an entity over time.

There are multiple ways to implement SCD2, but a common and effective approach includes adding the following technical columns:

DATE_START: (Mandatory) The date from which the record is valid.
DATE_END: (Mandatory) The date until which the record is valid. For the current version of the record, the value could be set to a far-future date or NULL.
IS_DELETED: (Optional) A flag indicating whether the record is deleted. Useful when deletions need to be tracked.
TIMESTAMP_INSERTED: (Optional) The timestamp when the record was inserted.
TIMESTAMP_UPDATED: (Optional) The timestamp when the record was last updated.

Considerations

Change granularity: Decide the level of detail you want to track (daily, hourly, by minute, etc.). This will guide the appropriate datatype and format for DATE_START and DATE_END.
Composite primary key: The new primary key will typically be a combination of the natrual key and the DATE_START field.
Update detection: Ensure your process detects and tracks only meaningful changes. It should not create new rows when no actual data has changed.
Standardization: Define how your SCD2 logic should behave:
- Is DATE_END inclusive or exclusive?
- Should DATE_END be NULL or a predefined future date?
- How will you manage TIMESTAMP_INSERTED and TIMESTAMP_UPDATED?
Field-level tracking: Decide whether all fields should trigger a historical entry, or if certain fields (e.g. technical metadata) should be excluded. Be explicit about which ones.

✅ Strengths

Historical tracking: Preserves a full history of changes, enabling accurate time-based analysis and reporting.

⚠️ Weaknesses

Complexity: More complex to implement and maintain than simpler SCD types.
Performance: Requires more storage to maintain historical data. Queries on SCD2 tables must account for time ranges to ensure accuracy.

📌 Example Scenario

Let’s update our example to include the SCD2 technical columns:

ID	DATE_START	DATE_END	IS_DELETED	TIMESTAMP_INSERTED	TIMESTAMP_UPDATED	NAME	COUNTRY	EMAIL
1	2025-01-01	9999-12-31	FALSE	2025-01-01 14:24:49	2025-01-01 14:24:49	Michael Scott	USA	michael.scott@dundermifflinpc.com
2	2025-01-01	9999-12-31	FALSE	2025-01-01 14:24:49	2025-01-01 14:24:49	Dwight Schrute	USA	dwight.schrute@dundermifflinpc.com

On 2025-02-17 Michael moves to Spain and updates his profile. On 2025-04-18 Dwight removes his information from our system.

After applying SCD2, we would have the following data in our table:

ID	DATE_START	DATE_END	IS_DELETED	TIMESTAMP_INSERTED	TIMESTAMP_UPDATED	NAME	COUNTRY	EMAIL
1	2025-01-01	2025-02-16	FALSE	2025-01-01 14:24:49	2025-02-17 10:10:00	Michael Scott	USA	michael.scott@dundermifflinpc.com
1	2025-02-17	9999-12-31	FALSE	2025-02-17 10:10:00	2025-02-17 10:10:00	Michael Scott	Spain	michael.scott@dundermifflinpc.com
2	2025-01-01	2025-04-17	FALSE	2025-01-01 14:24:49	2025-04-18 11:11:00	Dwight Schrute	USA	dwight.schrute@dundermifflinpc.com
2	2025-04-18	9999-12-31	TRUE	2025-04-18 11:11:00	2025-04-18 11:11:00	Dwight Schrute	USA	dwight.schrute@dundermifflinpc.com

SCD Type 3 (SCD3): Adding a New Column

SCD Type 3 tracks limited historical data by adding a new column for each dimension you want to preserve a previous value for. This approach generally allows you to store the current value and the immediate past value of a dimension.

✅ Strengths

Relevance: Ensures only current, actionable data is stored.
Performance: Requires less storage compared to row-based history, since historical values are stored in a few additional columns rather than entire rows.
Selective historicization: You can define which specific dimensions to track historically, allowing you to discard history for less relevant fields.

⚠️ Weaknesses

Limited historical tracking: You can only track as much history as you have columns for. For example, storing more than one previous value would require multiple additional columns per field, which becomes unmanageable.
Increased query complexity: You'll need to handle additional columns in queries, which can complicate analysis logic.
Table definition complexity: As more dimensions require historical tracking, the table structure (DDL) can become messy and harder to manage.

📌 Example Scenario

Suppose we’re only interested in tracking the previous country of residence for each customer. The initial table structure would thus be:

ID	NAME	PREVIOUS_COUNTRY	COUNTRY	EMAIL
1	Michael Scott	NULL	USA	michael.scott@dundermifflinpc.com
2	Dwight Schrute	NULL	USA	dwight.schrute@dundermifflinpc.com

Later, Michael moves to Spain and updates his profile. After applying SCD3, the table becomes:

ID	NAME	PREVIOUS_COUNTRY	COUNTRY	EMAIL
1	Michael Scott	USA	Spain	michael.scott@dundermifflinpc.com
2	Dwight Schrute	NULL	USA	dwight.schrute@dundermifflinpc.com

SCD Type 4 (SCD4): Separating History and Current Data

SCD Type 4 uses two separate tables to manage data changes:

A history table that tracks all changes to an entity (much like an SCD2 table).
A current table that contains only the most recent snapshot for each record.

✅ Strengths

Historical tracking: Preserves a full history of changes, enabling accurate time-based analysis and reporting.
Efficient access to current data: Queries that only need the latest version can utilize the current table directly, improving performance.

⚠️ Weaknesses

Complexity: Maintaining two synchronized tables adds implementation and maintenance complexity.
Storage requirements: Requires more storage to maintain historical data.

📌 Example Scenario

In our scenario, Michael moves to Spain. After applying SCD4, we obtain the following two tables:

CUSTOMER_HISTORY:

ID	DATE_START	DATE_END	IS_DELETED	TIMESTAMP_INSERTED	TIMESTAMP_UPDATED	NAME	COUNTRY	EMAIL
1	2025-01-01	2025-02-16	FALSE	2025-01-01 14:24:49	2025-02-17 10:10:00	Michael Scott	USA	michael.scott@dundermifflinpc.com
1	2025-02-17	9999-12-31	FALSE	2025-02-17 10:10:00	2025-02-17 10:10:00	Michael Scott	Spain	michael.scott@dundermifflinpc.com
2	2025-01-01	9999-12-31	FALSE	2025-01-01 14:24:49	2025-01-01 14:24:49	Dwight Schrute	USA	dwight.schrute@dundermifflinpc.com

CUSTOMER_CURRENT:

ID	NAME	COUNTRY	EMAIL
1	Michael Scott	Spain	michael.scott@dundermifflinpc.com
2	Dwight Schrute	USA	dwight.schrute@dundermifflinpc.com

SCD Type 6 (SCD6): A Hybrid Approach

SCD Type 6 is a hybrid technique that combines elements from multiple SCD types to capture a rich historical view of your data:

SCD1: In-place updates for non-historized fields (e.g. technical metadata).
SCD2: Full row-level history tracking.
SCD3: Immediate past values stored in separate columns for selected fields.

✅ Strengths

Historical tracking: Preserves a full history of changes, enabling accurate time-based analysis and reporting.
Quick comparisions: Allows to easily compare current and previous values without scanning older rows.

⚠️ Weaknesses

Complexity: The most complex SCD type to implement, maintain, and document correctly.
Performance: Higher storage and query complexity. Queries must account for both row-based versions and column-level changes to accurately reflect the data for a specific timeframe.

📌 Example Scenario

We start with the same base scenario used in the SCD2 example. Michael moves to Spain on 2025-02-17, and Dwight deletes his profile on 2025-04-18.

After applying SCD6, our table structure could look like this:

ID	DATE_START	DATE_END	IS_DELETED	TIMESTAMP_INSERTED	TIMESTAMP_UPDATED	NAME	PREVIOUS_COUNTRY	COUNTRY	EMAIL
1	2025-01-01	2025-02-16	FALSE	2025-01-01 14:24:49	2025-02-17 10:10:00	Michael Scott	NULL	USA	michael.scott@dundermifflinpc.com
1	2025-02-17	9999-12-31	FALSE	2025-02-17 10:10:00	2025-02-17 10:10:00	Michael Scott	USA	Spain	michael.scott@dundermifflinpc.com
2	2025-01-01	2025-04-17	FALSE	2025-01-01 14:24:49	2025-04-18 11:11:00	Dwight Schrute	NULL	USA	dwight.schrute@dundermifflinpc.com
2	2025-04-18	9999-12-31	TRUE	2025-04-18 11:11:00	2025-04-18 11:11:00	Dwight Schrute	NULL	USA	dwight.schrute@dundermifflinpc.com

This structure:

Maintains a full history of changes via rows (SCD2).
Tracks the immediately previous value of the COUNTRY field (SCD3).

Selecting the Correct SCD Type

Selecting the appropriate SCD technique depends on your specific business needs, the characteristics and nature of your data, and system capabilities. Here are key factors to consider:

Reporting requirements: Do you need to track historical data or only the most recent values? How far back must your history go?
Frequency of changes: How often does your data change?
Performance impact: As historical data grows, some SCD types (e.g., SCD2 or SCD6) may degrade performance. Define a strategy to mitigate these challenges.
Implementation effort: Some SCD types require more effort to develop and maintain. Choose an approach according to your available resources.
Source data quality: If your source data lacks reliable timestamps, change flags, or consistent update patterns—or if the ETL process doesn't handle them properly—historical data may become inaccurate.

Summary Table

The following table summarizes and compares the different SCD techniques:

🏷️ SCD Type	🕒 History Tracking	⚙️ Complexity	🚀 Performance	💾 Storage Consumption	📌 Example Use Case
SCD0	❌ None	🟢 Very Low	🟢 Very High	🟢 Minimal	Static reference data
SCD1	❌ None (Overwrite)	🟢 Low	🟢 High	🟢 Minimal	Cases where historical data is irrelevant
SCD2	✅ Full (row-level)	🔴 High	🔴 Low	🔴 High	Auditing, compliance, and time-based analysis
SCD3	☑️ Partial (column-level)	🟡 Medium	🟡 Medium	🟡 Medium	Scenarios needing only the most recent change
SCD4	✅ Full (separate table)	🔴 High	🟡 Medium	🔴 High	When separating history improves current-data performance
SCD6	✅ Full (row + column-level)	🔴 Very High	🔴 Low / Medium	🔴 High	Complex auditing with need for fast comparisons between current and past states

Conclusion

Slowly Changing Dimensions might seem straightforward at first, but as data models grow and business rules evolve, getting them right becomes more complex and critical. By carefully selecting the appropriate SCD type and implementing it consistently across your warehouse, you set the stage for reliable reporting, smoother audits, and accurate analytics.

Each technique has its strengths and trade-offs. The key is to align your choice with your data needs and available resources. While SCDs are not the only way to track historical data, when done right, they ensure your data remains both actionable and trustworthy.

Managing Historical Data with Slowly Changing Dimensions

Table of contents

Introduction

Sample Customer Data for SCD Examples

Slowly Changing Dimension (SCD)

Dimensions

SCD Types

SCD Type 0 (SCD0): No Changes Allowed

SCD Type 1 (SCD1): Overwriting Existing Data

✅ Strengths

⚠️ Weaknesses

📌 Example Scenario

SCD Type 2 (SCD2): Adding a New Row

Considerations

✅ Strengths

⚠️ Weaknesses

📌 Example Scenario

SCD Type 3 (SCD3): Adding a New Column

✅ Strengths

⚠️ Weaknesses

📌 Example Scenario

SCD Type 4 (SCD4): Separating History and Current Data

✅ Strengths

⚠️ Weaknesses

📌 Example Scenario

SCD Type 6 (SCD6): A Hybrid Approach

✅ Strengths

⚠️ Weaknesses

📌 Example Scenario

Selecting the Correct SCD Type

Summary Table

Conclusion

Subscribe to my newsletter

Elie Fayad

Elie Fayad