Dimensional Modelling

Ashish GSAshish GS
6 min read

What is Dimensional Modelling in Data Warehouses?

Dimensional modelling (DM) is a data structure technique optimised for data storage in a Data warehouse. The purpose of dimensional modelling is to optimise the database for faster retrieval of data.

A dimensional model in the warehouse is designed to read, summarise, and analyse numeric information like values, balances, counts, weights, etc. in a data warehouse. In contrast, relation models are optimised for addition, updation and deletion of data in a real-time Online Transaction System.

Elements of Dimensional Modelling

  • Fact

  • Dimensions

  • Attributes

  • Fact Table

  • Dimension Table

Fact

Facts are the measurements/metrics or facts from your business process. For a school (online coaching centres) business process, a measurent would be sales number by different types of courses/products.

Dimension

Dimension provides the context surrounding a business process event. In simple terms, they give who, what, and where of a fact. In the Sales business process, for the fact quarterly sales number, dimensions would be

  • Who – Customer Names

  • Where – Location

  • What – Product Name

Attributes

The Attributes are the various characteristics of the dimension in dimensional data modelling.

In the Location dimension, the attributes can be

  • State

  • Country

  • Zipcode etc.

Attributes are used to search, filter, or classify facts. Dimension tables contain attributes.

Fact Table

A fact table is a primary table in dimension modelling.

A Fact Table contains

  • Measurements/facts

  • Foreign key to the dimension table

Dimension Table

  • A dimension table contains the dimensions of a fact.

  • They are joined to fact table via a foreign key.

  • Dimension tables are de-normalised tables.

  • The Dimension Attributes are the various columns in a dimension table.

  • Dimensions offers descriptive characteristics of the facts with the help of their attributes.

  • There is no set limit given for the number of dimensions.

  • The dimension can also contain one or more hierarchical relationships.

Steps of Dimensional Modelling

The accuracy in creating your Dimensional modelling determines the success of your data warehouse implementation. Here are the steps to create Dimension Model

  1. Identify Business Process / Objective

  2. Identify Grain (level of detail)

  3. Identify Dimensions

  4. Identify Facts

  5. Building of Schema

Step 1: Identify the Business Process / Objective

  • Before modelling the data, you should find the types of dimensional modelling appropriate for your data model. The dimensional modelling process (or any data modelling) begins with the identification of the business process that you want to track.

  • For example: In our case, we would want to track the revenue of each School by different types of products.

  • This is the most important step of the Data Modelling process, and a failure here would have cascading and irreparable defects.

Step 2: Identify Grain (level of detail)

  • The Grain describes the level of detail for the business problem/solution. It is the process of identifying the lowest level of information for any table in your data warehouse.

  • If a table contains transaction data for every day, then it should be daily granularity.

  • If a table contains total sales data for each month, then it has monthly granularity.

During this stage, you answer questions like

  1. Do we need to store all the available products or just a few types of products? This decision is based on the business processes selected for the Data warehouse.

  2. Do we store the product sale information on a monthly, weekly, daily or hourly basis? This decision depends on the nature of the reports requested.

  3. How do the above two choices affect the database size?

  • For example Admin wants to know the sales of each product in different locations on a daily basis. So the grain is “product sale information by location by date”.

Step 3: Identify the Dimension

Dimensions are nouns like date, school, courses, etc. These dimensions are where all the data should be stored. For example, the courses' dimension may contain data like name, price etc.

  • For example Admin wants to know sales of each product in different locations daily

  • Dimensions: Product, Location and Time

  • Attributes: For Product: Product key (Foreign Key), Name, Type, Properties

  • Hierarchies: For Location: Country, State, City, Street Address, Name

Step 4: Identify the Fact

  • This step is co-associated with the business users of the system because this is where they get access to data stored in the data warehouse. Most of the fact table rows are numerical values like price or count or rate of percentage etc.

  • For example Admin wants to know the sales of each product in different locations on a daily basis. So the fact here is the Sum of Sales by product by location by time.

Step 5: Build the schema

In this step, you implement the Dimension Model. There are two popular schemas

  1. Star Schema

    The star schema architecture is easy to design. It is called a star schema because the diagram resembles a star, with points radiating from a centre. The centre of the star consists of the fact table, and the points of the star are dimension tables.

    The fact tables in a star schema which is the third normal form whereas dimensional tables are de-normalised.

  2. Snowflake Schema

    The snowflake schema is an extension of the star schema. In a snowflake schema, each dimension is normalised and connected to more dimension tables.

What is a Star Schema?

Star Schema is the data warehouse, in which the centre of the star can have one fact table and a number of associated dimension tables. It is known as star schema as its structure resembles a star. The Star Schema data model is the simplest type of Data Warehouse schema. It is also known as Star Join Schema and is optimized for querying large data sets.

What is a Snowflake Schema?

Snowflake Schema in the data warehouse is a logical arrangement of tables in a multidimensional database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an extension of a Star Schema, and it adds additional dimensions. The dimension tables are normalized which splits data into additional tables.

Following is a key difference between Snowflake schema vs Star schema:

Star SchemaSnowflake Schema
Hierarchies for the dimensions are stored in the dimensional table.Hierarchies are divided into separate tables.
It contains a fact table surrounded by dimension tables.One fact table is surrounded by dimension tables which are in turn surrounded by dimension tables.
In a star schema, only a single join creates the relationship between the fact table and any dimension tables.A snowflake schema requires many joins to fetch the data.
Simple DB Design.Complex DB Design.
Denormalized Data structure and queries run faster but may require more storage space.Normalized Data Structure, hence will require lesser storage space.
High level of Data redundancyVery low-level data redundancy
A single Dimension table contains aggregated data.Data Split into different Dimension Tables.
Cube processing is faster.Cube processing might be slow because of the complex join.
Offers higher performing queries using Star Join Query Optimization.
0
Subscribe to my newsletter

Read articles from Ashish GS directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ashish GS
Ashish GS

I am a backend developer from Bengaluru