Dimensional Data Model for Airbnb and VRBO
Understanding the Business Domain
Before diving into the data model, it's crucial to understand the key entities and relationships in an online marketplace for short-term homestays. Here are the primary entities:
Users: Guests and hosts
Properties: Listings available for rent
Bookings: Reservations made by guests
Payments: Financial transactions associated with bookings
Reviews: Feedback provided by guests and hosts
Data Model Design
To effectively capture these entities and their relationships, I'll employ a dimensional modeling approach. This approach organizes data into fact and dimension tables, making it easier to analyze and report on various aspects of the business.
Dimension Tables
dim_users
CREATE TABLE dim_users (
user_id INT PRIMARY KEY AUTO_INCREMENT,
user_type VARCHAR(20),
registration_date DATE,
location VARCHAR(255),
effective_date DATE,
expiry_date DATE,
current_flag BOOLEAN
);
dim_hosts
CREATE TABLE dim_hosts (
host_id INT PRIMARY KEY AUTO_INCREMENT,
host_name VARCHAR(255),
host_location VARCHAR(255),
effective_date DATE,
expiry_date DATE,
current_flag BOOLEAN
);
dim_properties
CREATE TABLE dim_properties (
property_id INT PRIMARY KEY AUTO_INCREMENT,
host_id INT,
property_type VARCHAR(255),
city VARCHAR(255),
country VARCHAR(255),
effective_date DATE,
expiry_date DATE,
current_flag BOOLEAN
);
Fact Tables
fact_bookings
CREATE TABLE fact_bookings (
booking_id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
listing_id INT,
booking_date DATE,
check_in_date DATE,
check_out_date DATE,
total_price DECIMAL(10,2),
booking_status VARCHAR(20),
review_by_user VARCHAR(255),
review_by_host VARCHAR(255)
);
fact_booking_prices
CREATE TABLE fact_booking_prices (
booking_id INT,
date_id INT,
price DECIMAL(10,2),
PRIMARY KEY (booking_id, date_id)
);
Explanation:
Dimension Tables: Store reference data about users, hosts, and properties.
Fact Tables: Store transactional data related to bookings and their associated prices.
Relationships: Foreign keys connect the fact tables to dimension tables.
Data Types: Data types are chosen to match the expected values (e.g.,
int
for IDs,varchar
for text,date
for dates).
Business Questions and Analysis
Below are some business questions our model can answer with simple SQL queries.
1. What is the average booking length?
WITH booking_duration AS (
SELECT
booking_id,
DATEDIFF(check_out_date, check_in_date) AS duration
FROM
fact_bookings
)
SELECT
AVG(duration) AS average_booking_length
FROM
booking_duration;
2. Which cities are the most popular destinations for guests?
SELECT
l.city,
COUNT(*) AS total_bookings
FROM
fact_bookings b
INNER JOIN dim_listings l ON b.listing_id = l.listing_id
GROUP BY
l.city
ORDER BY
COUNT(*) DESC;
3. What is the average price difference between weekdays and weekends for bookings in a specific city?
WITH booking_data AS (
SELECT
b.booking_id,
d.day_of_week,
bp.price
FROM
fact_bookings b
INNER JOIN dim_dates d ON b.booking_date = d.date
INNER JOIN fact_booking_prices bp ON b.booking_id = bp.booking_id
WHERE
l.city = 'New York City' -- Replace with your desired city
)
SELECT
day_of_week,
AVG(price) AS average_price
FROM
booking_data
GROUP BY
day_of_week;
4. Identify the top-rated hosts based on guest reviews.
WITH host_reviews AS (
SELECT
h.host_id,
AVG(b.review_by_user) AS average_rating
FROM
dim_hosts h
INNER JOIN fact_bookings b ON h.host_id = b.host_id
WHERE
b.review_by_user IS NOT NULL
GROUP BY
h.host_id
)
SELECT
h.host_name,
hr.average_rating
FROM
dim_hosts h
INNER JOIN host_reviews hr ON h.host_id = hr.host_id
ORDER BY
hr.average_rating DESC
LIMIT 5;
Thanks for reading!
Cover photo by Andrea Davis (via Pexels)
Subscribe to my newsletter
Read articles from Analytics Engineering Digest directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Analytics Engineering Digest
Analytics Engineering Digest
Analytics Engineer Digest is dedicated to sharing expert insights and practical knowledge in the field of analytics engineering. With a focus on data modeling, data warehousing, ETL/ELT processes, and the creation of high-quality datasets, this blog serves as a resource for building scalable data architectures that drive business success. Readers can explore real-world data models, interview preparation tips, and best practices for crafting metrics, dashboards, and automated workflows using SQL, Python, and AWS tools. Analytics Engineer Digest is committed to empowering data professionals to elevate their skills and unlock the full potential of their analytics projects.