Managing Historical Data with Slowly Changing Dimensions


Introduction
In a world where data changes faster than ever, keeping track of these changes is no longer optional, it is essential for understanding trends, running accurate reports, and making informed decisions. When a customer changes their address, when a product gets rebranded, when an employee moves to another department, how do you reflect these changes in your data warehouse without losing valuable historical insights?
There are several data warehousing techniques to manage historical data such as audit tables, temporal tables, and data vault modelling. However, in this article, we will focus on one of the most widely adopted approaches: Slowly Changing Dimensions (SCDs).
In this article, we’ll explore:
- What are SCDs and why they matter
- The different types of SCDs
- How to choose the right type for your specific use case
Sample Customer Data for SCD Examples
To illustrate the different SCD techniques, we'll use a simple CUSTOMER
table as our example throughout this article.
The table stores basic information about our customers:
ID
: A unique identifier for each customer.NAME
: The customer's name.COUNTRY
: The customer's country of residence.EMAIL
: The customer's e-mail address.
Here is a sample of the data:
ID | NAME | COUNTRY | |
1 | Michael Scott | USA | michael.scott@dundermifflinpc.com |
2 | Dwight Schrute | USA | dwight.schrute@dundermifflinpc.com |
Slowly Changing Dimension (SCD)
SCDs are a set of data warehousing techniques used to manage and track changes to data over time. They address the fact that certain attributes of an entity—like a product's production cost or a customer's address—can change over time, and those changes need to be recorded for accurate reporting and analysis.
Dimensions
In the context of SCDs, dimensions refer to descriptive attributes of an entity. In our CUSTOMER
table, the fields NAME
, COUNTRY
, and EMAIL
are all dimensions describing a single customer.
SCD Types
SCD Type 0 (SCD0): No Changes Allowed
SCD Type 0 does not allow changes to the original data once ingested. It is mainly used for static reference data that should never change (e.g. country codes). If a correction is required, it is handled through a formal exception processes rather than by automatically updating data.
SCD Type 1 (SCD1): Overwriting Existing Data
SCD Type 1 updates the current value of a field by overwriting it with the new value. No historical data is retained.
It is useful when:
- Only the most recent version of a dimension attribute is needed.
- Tracking historical changes is not required.
✅ Strengths
- Simplicity: Easy to implement and maintain.
- Performance: Requires less storage and results in faster queries, since there's no need to manage historical records.
- Relevance: Ensures only current, actionable data is used.
⚠️ Weaknesses
- No historical data tracking: Once data is overridden, the previous value is lost forever—unless a backup mechanism is in place.
📌 Example Scenario
Over time, customers may update their contact information—for instance, they might stop using an old email or switch to another account. In such cases, keeping only the most recent contact details is usually sufficient. Storing outdated information could lead to confusion or failed communication attempts if not handled properly.
Suppose Michael changes his email to: michael.g.scott@dundermifflinpc.com
.
The updated table would look like this:
ID | NAME | COUNTRY | |
1 | Michael Scott | USA | michael.g.scott@dundermifflinpc.com |
2 | Dwight Schrute | USA | dwight.schrute@dundermifflinpc.com |
SCD Type 2 (SCD2): Adding a New Row
SCD Type 2 tracks historical data by adding a new row to the table for each update. The previous row is preserved, typically with an end date that marks when it stopped being valid. This method allows you to trace the evolution of an entity over time.
There are multiple ways to implement SCD2, but a common and effective approach includes adding the following technical columns:
- DATE_START: (Mandatory) The date from which the record is valid.
- DATE_END: (Mandatory) The date until which the record is valid. For the current version of the record, the value could be set to a far-future date or
NULL
. - IS_DELETED: (Optional) A flag indicating whether the record is deleted. Useful when deletions need to be tracked.
- TIMESTAMP_INSERTED: (Optional) The timestamp when the record was inserted.
- TIMESTAMP_UPDATED: (Optional) The timestamp when the record was last updated.
Considerations
- Change granularity: Decide the level of detail you want to track (daily, hourly, by minute, etc.). This will guide the appropriate datatype and format for
DATE_START
andDATE_END
. - Composite primary key: The new primary key will typically be a combination of the natrual key and the
DATE_START
field. - Update detection: Ensure your process detects and tracks only meaningful changes. It should not create new rows when no actual data has changed.
- Standardization: Define how your SCD2 logic should behave:
- Is
DATE_END
inclusive or exclusive? - Should
DATE_END
beNULL
or a predefined future date? - How will you manage
TIMESTAMP_INSERTED
andTIMESTAMP_UPDATED
?
- Is
- Field-level tracking: Decide whether all fields should trigger a historical entry, or if certain fields (e.g. technical metadata) should be excluded. Be explicit about which ones.
✅ Strengths
- Historical tracking: Preserves a full history of changes, enabling accurate time-based analysis and reporting.
⚠️ Weaknesses
- Complexity: More complex to implement and maintain than simpler SCD types.
- Performance: Requires more storage to maintain historical data. Queries on SCD2 tables must account for time ranges to ensure accuracy.
📌 Example Scenario
Let’s update our example to include the SCD2 technical columns:
ID | DATE_START | DATE_END | IS_DELETED | TIMESTAMP_INSERTED | TIMESTAMP_UPDATED | NAME | COUNTRY | |
1 | 2025-01-01 | 9999-12-31 | FALSE | 2025-01-01 14:24:49 | 2025-01-01 14:24:49 | Michael Scott | USA | michael.scott@dundermifflinpc.com |
2 | 2025-01-01 | 9999-12-31 | FALSE | 2025-01-01 14:24:49 | 2025-01-01 14:24:49 | Dwight Schrute | USA | dwight.schrute@dundermifflinpc.com |
On 2025-02-17 Michael moves to Spain and updates his profile. On 2025-04-18 Dwight removes his information from our system.
After applying SCD2, we would have the following data in our table:
ID | DATE_START | DATE_END | IS_DELETED | TIMESTAMP_INSERTED | TIMESTAMP_UPDATED | NAME | COUNTRY | |
1 | 2025-01-01 | 2025-02-16 | FALSE | 2025-01-01 14:24:49 | 2025-02-17 10:10:00 | Michael Scott | USA | michael.scott@dundermifflinpc.com |
1 | 2025-02-17 | 9999-12-31 | FALSE | 2025-02-17 10:10:00 | 2025-02-17 10:10:00 | Michael Scott | Spain | michael.scott@dundermifflinpc.com |
2 | 2025-01-01 | 2025-04-17 | FALSE | 2025-01-01 14:24:49 | 2025-04-18 11:11:00 | Dwight Schrute | USA | dwight.schrute@dundermifflinpc.com |
2 | 2025-04-18 | 9999-12-31 | TRUE | 2025-04-18 11:11:00 | 2025-04-18 11:11:00 | Dwight Schrute | USA | dwight.schrute@dundermifflinpc.com |
SCD Type 3 (SCD3): Adding a New Column
SCD Type 3 tracks limited historical data by adding a new column for each dimension you want to preserve a previous value for. This approach generally allows you to store the current value and the immediate past value of a dimension.
✅ Strengths
- Relevance: Ensures only current, actionable data is stored.
- Performance: Requires less storage compared to row-based history, since historical values are stored in a few additional columns rather than entire rows.
- Selective historicization: You can define which specific dimensions to track historically, allowing you to discard history for less relevant fields.
⚠️ Weaknesses
- Limited historical tracking: You can only track as much history as you have columns for. For example, storing more than one previous value would require multiple additional columns per field, which becomes unmanageable.
- Increased query complexity: You'll need to handle additional columns in queries, which can complicate analysis logic.
- Table definition complexity: As more dimensions require historical tracking, the table structure (DDL) can become messy and harder to manage.
📌 Example Scenario
Suppose we’re only interested in tracking the previous country of residence for each customer. The initial table structure would thus be:
ID | NAME | PREVIOUS_COUNTRY | COUNTRY | |
1 | Michael Scott | NULL | USA | michael.scott@dundermifflinpc.com |
2 | Dwight Schrute | NULL | USA | dwight.schrute@dundermifflinpc.com |
Later, Michael moves to Spain and updates his profile. After applying SCD3, the table becomes:
ID | NAME | PREVIOUS_COUNTRY | COUNTRY | |
1 | Michael Scott | USA | Spain | michael.scott@dundermifflinpc.com |
2 | Dwight Schrute | NULL | USA | dwight.schrute@dundermifflinpc.com |
SCD Type 4 (SCD4): Separating History and Current Data
SCD Type 4 uses two separate tables to manage data changes:
- A history table that tracks all changes to an entity (much like an SCD2 table).
- A current table that contains only the most recent snapshot for each record.
✅ Strengths
- Historical tracking: Preserves a full history of changes, enabling accurate time-based analysis and reporting.
- Efficient access to current data: Queries that only need the latest version can utilize the current table directly, improving performance.
⚠️ Weaknesses
- Complexity: Maintaining two synchronized tables adds implementation and maintenance complexity.
- Storage requirements: Requires more storage to maintain historical data.
📌 Example Scenario
In our scenario, Michael moves to Spain. After applying SCD4, we obtain the following two tables:
CUSTOMER_HISTORY
:
ID | DATE_START | DATE_END | IS_DELETED | TIMESTAMP_INSERTED | TIMESTAMP_UPDATED | NAME | COUNTRY | |
1 | 2025-01-01 | 2025-02-16 | FALSE | 2025-01-01 14:24:49 | 2025-02-17 10:10:00 | Michael Scott | USA | michael.scott@dundermifflinpc.com |
1 | 2025-02-17 | 9999-12-31 | FALSE | 2025-02-17 10:10:00 | 2025-02-17 10:10:00 | Michael Scott | Spain | michael.scott@dundermifflinpc.com |
2 | 2025-01-01 | 9999-12-31 | FALSE | 2025-01-01 14:24:49 | 2025-01-01 14:24:49 | Dwight Schrute | USA | dwight.schrute@dundermifflinpc.com |
CUSTOMER_CURRENT
:
ID | NAME | COUNTRY | |
1 | Michael Scott | Spain | michael.scott@dundermifflinpc.com |
2 | Dwight Schrute | USA | dwight.schrute@dundermifflinpc.com |
SCD Type 6 (SCD6): A Hybrid Approach
SCD Type 6 is a hybrid technique that combines elements from multiple SCD types to capture a rich historical view of your data:
- SCD1: In-place updates for non-historized fields (e.g. technical metadata).
- SCD2: Full row-level history tracking.
- SCD3: Immediate past values stored in separate columns for selected fields.
✅ Strengths
- Historical tracking: Preserves a full history of changes, enabling accurate time-based analysis and reporting.
- Quick comparisions: Allows to easily compare current and previous values without scanning older rows.
⚠️ Weaknesses
- Complexity: The most complex SCD type to implement, maintain, and document correctly.
- Performance: Higher storage and query complexity. Queries must account for both row-based versions and column-level changes to accurately reflect the data for a specific timeframe.
📌 Example Scenario
We start with the same base scenario used in the SCD2 example. Michael moves to Spain on 2025-02-17, and Dwight deletes his profile on 2025-04-18.
After applying SCD6, our table structure could look like this:
ID | DATE_START | DATE_END | IS_DELETED | TIMESTAMP_INSERTED | TIMESTAMP_UPDATED | NAME | PREVIOUS_COUNTRY | COUNTRY | |
1 | 2025-01-01 | 2025-02-16 | FALSE | 2025-01-01 14:24:49 | 2025-02-17 10:10:00 | Michael Scott | NULL | USA | michael.scott@dundermifflinpc.com |
1 | 2025-02-17 | 9999-12-31 | FALSE | 2025-02-17 10:10:00 | 2025-02-17 10:10:00 | Michael Scott | USA | Spain | michael.scott@dundermifflinpc.com |
2 | 2025-01-01 | 2025-04-17 | FALSE | 2025-01-01 14:24:49 | 2025-04-18 11:11:00 | Dwight Schrute | NULL | USA | dwight.schrute@dundermifflinpc.com |
2 | 2025-04-18 | 9999-12-31 | TRUE | 2025-04-18 11:11:00 | 2025-04-18 11:11:00 | Dwight Schrute | NULL | USA | dwight.schrute@dundermifflinpc.com |
This structure:
- Maintains a full history of changes via rows (SCD2).
- Tracks the immediately previous value of the
COUNTRY
field (SCD3).
Selecting the Correct SCD Type
Selecting the appropriate SCD technique depends on your specific business needs, the characteristics and nature of your data, and system capabilities. Here are key factors to consider:
- Reporting requirements: Do you need to track historical data or only the most recent values? How far back must your history go?
- Frequency of changes: How often does your data change?
- Performance impact: As historical data grows, some SCD types (e.g., SCD2 or SCD6) may degrade performance. Define a strategy to mitigate these challenges.
- Implementation effort: Some SCD types require more effort to develop and maintain. Choose an approach according to your available resources.
- Source data quality: If your source data lacks reliable timestamps, change flags, or consistent update patterns—or if the ETL process doesn't handle them properly—historical data may become inaccurate.
Summary Table
The following table summarizes and compares the different SCD techniques:
🏷️ SCD Type | 🕒 History Tracking | ⚙️ Complexity | 🚀 Performance | 💾 Storage Consumption | 📌 Example Use Case |
SCD0 | ❌ None | 🟢 Very Low | 🟢 Very High | 🟢 Minimal | Static reference data |
SCD1 | ❌ None (Overwrite) | 🟢 Low | 🟢 High | 🟢 Minimal | Cases where historical data is irrelevant |
SCD2 | ✅ Full (row-level) | 🔴 High | 🔴 Low | 🔴 High | Auditing, compliance, and time-based analysis |
SCD3 | ☑️ Partial (column-level) | 🟡 Medium | 🟡 Medium | 🟡 Medium | Scenarios needing only the most recent change |
SCD4 | ✅ Full (separate table) | 🔴 High | 🟡 Medium | 🔴 High | When separating history improves current-data performance |
SCD6 | ✅ Full (row + column-level) | 🔴 Very High | 🔴 Low / Medium | 🔴 High | Complex auditing with need for fast comparisons between current and past states |
Conclusion
Slowly Changing Dimensions might seem straightforward at first, but as data models grow and business rules evolve, getting them right becomes more complex and critical. By carefully selecting the appropriate SCD type and implementing it consistently across your warehouse, you set the stage for reliable reporting, smoother audits, and accurate analytics.
Each technique has its strengths and trade-offs. The key is to align your choice with your data needs and available resources. While SCDs are not the only way to track historical data, when done right, they ensure your data remains both actionable and trustworthy.
Subscribe to my newsletter
Read articles from Elie Fayad directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Elie Fayad
Elie Fayad
I'm a data professional specializing in SQL and Snowflake, with a strong background in cloud migrations, data platform configuration, ETL/ELT pipeline development, data modeling, and workflow orchestration. I'm proactive, eager to learn, and passionate about tackling new challenges. I enjoy exploring emerging technologies and sharing knowledge with the community!