Dimensional Data Model for App Help Center Page
Introduction
The success of a digital product heavily depends on providing users with efficient support and guidance. One of the key tools for achieving this is a well-structured help center that users can navigate to find solutions to their problems. For businesses, understanding how users interact with the help center is crucial for improving the overall user experience and engagement. By tracking key events such as logins, page views, and specific interactions (e.g., likes, comments), businesses can identify areas for improvement and measure the success of various content types.
In this blog post, we will explore a dimensional data model designed for the help center page of an advertising platform. We will discuss the facts and dimensions that make up the data model and demonstrate how this model can be leveraged to generate valuable insights through analytics. Additionally, we’ll present the PostgreSQL schemas and provide SQL queries to answer some of the most common business questions.
Use Cases for the Help Center Data Model
Our dimensional model allows us to answer several important business questions, such as:
How many users visit the help center daily or monthly?
Which pages are the most popular and engaging?
What is the bounce rate for specific help center pages?
Which pages are receiving the most likes or comments?
How can we identify underperforming content based on low engagement?
These insights can help improve user experience by highlighting which content is most useful, identifying under-engaged sections, and providing a clearer understanding of how users navigate the platform.
The Facts and Dimensions
We have chosen three fact tables to capture different levels of interaction on the platform: fact_logins
, fact_page_views
, and fact_events
. This decision allows us to capture user engagement in a hierarchical manner: users log in, view pages, and perform specific events (such as clicking, liking, or commenting). Each fact table is associated with dimensions that provide context to the events. This structure allows us to maintain a high level of granularity while also keeping the model flexible for various types of analysis.
The dimensions chosen provide detailed information about users, accounts, pages, page categories, and event actions. These dimensions are designed to enrich the facts with relevant metadata that makes it easier to slice and dice the data, while also maintaining historical accuracy.
PostgreSQL Table Schemas
Fact Tables
-- fact_logins
CREATE TABLE fact_logins (
login_id SERIAL PRIMARY KEY,
user_id INT REFERENCES dim_users(user_id),
account_id INT REFERENCES dim_accounts(account_id),
login_start_time TIMESTAMP,
login_end_time TIMESTAMP,
login_duration_seconds INT
);
-- fact_page_views
CREATE TABLE fact_page_views (
page_view_id SERIAL PRIMARY KEY,
login_id INT REFERENCES fact_logins(login_id),
page_id INT REFERENCES dim_pages(page_id),
page_category_id INT REFERENCES dim_page_categories(page_category_id),
page_view_start_time TIMESTAMP,
page_view_end_time TIMESTAMP,
page_view_duration_seconds INT
);
-- fact_events
CREATE TABLE fact_events (
event_id SERIAL PRIMARY KEY,
page_view_id INT REFERENCES fact_page_views(page_view_id),
event_action_id INT REFERENCES dim_event_action_details(event_action_id),
event_start_time TIMESTAMP,
event_duration_seconds INT
);
Dimension Tables
-- dim_users
CREATE TABLE dim_users (
user_id SERIAL PRIMARY KEY,
first_name VARCHAR(100),
last_name VARCHAR(100),
email VARCHAR(255),
date_of_birth DATE,
account_id INT REFERENCES dim_accounts(account_id)
);
-- dim_accounts
CREATE TABLE dim_accounts (
account_id SERIAL PRIMARY KEY,
account_name VARCHAR(255),
industry VARCHAR(100),
region VARCHAR(50),
created_date DATE
);
-- dim_pages
CREATE TABLE dim_pages (
page_id SERIAL PRIMARY KEY,
page_name VARCHAR(255),
page_description TEXT
);
-- dim_page_categories
CREATE TABLE dim_page_categories (
page_category_id SERIAL PRIMARY KEY,
category_name VARCHAR(100)
);
-- dim_event_action_details
CREATE TABLE dim_event_action_details (
event_action_id SERIAL PRIMARY KEY,
action_name VARCHAR(100) -- e.g., 'click', 'like', 'comment'
);
Business Questions and SQL Queries
Below are some of the questions our data model can answer
1. Daily Active Users (DAU)
SELECT COUNT(DISTINCT user_id) AS daily_active_users
FROM fact_logins
WHERE login_start_time::DATE = CURRENT_DATE;
2. Monthly Active Users (MAU)
SELECT COUNT(DISTINCT user_id) AS monthly_active_users
FROM fact_logins
WHERE DATE_TRUNC('month', login_start_time) = DATE_TRUNC('month', CURRENT_DATE);
3. Most Active Accounts
SELECT account_id, COUNT(login_id) AS login_count
FROM fact_logins
GROUP BY account_id
ORDER BY login_count DESC
LIMIT 5;
4. Most Popular Categories
SELECT pc.category_name, COUNT(pv.page_view_id) AS page_views
FROM fact_page_views pv
INNER JOIN dim_page_categories pc
ON pv.page_category_id = pc.page_category_id
GROUP BY pc.category_name
ORDER BY page_views DESC
LIMIT 5;
5. Least Popular Pages
SELECT p.page_name, COUNT(pv.page_view_id) AS page_views
FROM fact_page_views pv
INNER JOIN dim_pages p
ON pv.page_id = p.page_id
GROUP BY p.page_name
ORDER BY page_views ASC
LIMIT 5;
6. Pages with Highest Bounce Rate
SELECT p.page_name,
COUNT(CASE WHEN pv.page_view_duration_seconds < 30 THEN 1 END)::DECIMAL / COUNT(pv.page_view_id) AS bounce_rate
FROM fact_page_views pv
INNER JOIN dim_pages p
ON pv.page_id = p.page_id
GROUP BY p.page_name
ORDER BY bounce_rate DESC
LIMIT 5;
7. Most Liked Pages
SELECT p.page_name, COUNT(e.event_id) AS like_count
FROM fact_events e
INNER JOIN dim_event_action_details ea
ON e.event_action_id = ea.event_action_id
INNER JOIN dim_pages p
ON e.page_view_id = p.page_id
WHERE ea.action_name = 'like'
GROUP BY p.page_name
ORDER BY like_count DESC
LIMIT 5;
8. Most Commented Pages
SELECT p.page_name, COUNT(e.event_id) AS comment_count
FROM fact_events e
INNER JOIN dim_event_action_details ea
ON e.event_action_id = ea.event_action_id
INNER JOIN dim_pages p
ON e.page_view_id = p.page_id
WHERE ea.action_name = 'comment'
GROUP BY p.page_name
ORDER BY comment_count DESC
LIMIT 5;
9. Most Active Users
SELECT user_id, COUNT(login_id) AS logins
FROM fact_logins
GROUP BY user_id
ORDER BY logins DESC
LIMIT 5;
10. Least Active Users
SELECT user_id, COUNT(login_id) AS logins
FROM fact_logins
GROUP BY user_id
ORDER BY logins ASC
LIMIT 5;
Conclusion
A well-designed data model is key to extracting meaningful insights from user interactions. For an advertising platform's help center, we created a model that captures user engagement at three different levels: logins, page views, and specific events. This dimensional model enables rapid analytics by structuring data in a way that is flexible, easy to query, and optimized for performance.
By leveraging this data model, business analysts can answer key questions about how users are interacting with the help center, which content is resonating the most, and where improvements can be made. This ultimately drives a more engaged and satisfied user base, contributing to the overall success of the platform.
Thanks for reading!
Photo credit: Nutrisense Inc via Pexels
Subscribe to my newsletter
Read articles from Analytics Engineering Digest directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Analytics Engineering Digest
Analytics Engineering Digest
Analytics Engineer Digest is dedicated to sharing expert insights and practical knowledge in the field of analytics engineering. With a focus on data modeling, data warehousing, ETL/ELT processes, and the creation of high-quality datasets, this blog serves as a resource for building scalable data architectures that drive business success. Readers can explore real-world data models, interview preparation tips, and best practices for crafting metrics, dashboards, and automated workflows using SQL, Python, and AWS tools. Analytics Engineer Digest is committed to empowering data professionals to elevate their skills and unlock the full potential of their analytics projects.