Dimensional Data Model for Airbnb and VRBO

Understanding the Business Domain

Before diving into the data model, it's crucial to understand the key entities and relationships in an online marketplace for short-term homestays. Here are the primary entities:

  • Users: Guests and hosts

  • Properties: Listings available for rent

  • Bookings: Reservations made by guests

  • Payments: Financial transactions associated with bookings

  • Reviews: Feedback provided by guests and hosts

Data Model Design

To effectively capture these entities and their relationships, I'll employ a dimensional modeling approach. This approach organizes data into fact and dimension tables, making it easier to analyze and report on various aspects of the business.

Dimension Tables

dim_users

CREATE TABLE dim_users (
    user_id INT PRIMARY KEY AUTO_INCREMENT,
    user_type VARCHAR(20),
    registration_date DATE,
    location VARCHAR(255),
    effective_date DATE,
    expiry_date DATE,
    current_flag BOOLEAN
);

dim_hosts

CREATE TABLE dim_hosts (
    host_id INT PRIMARY KEY AUTO_INCREMENT,
    host_name VARCHAR(255),
    host_location VARCHAR(255),
    effective_date DATE,
    expiry_date DATE,
    current_flag BOOLEAN
);

dim_properties

CREATE TABLE dim_properties (
    property_id INT PRIMARY KEY AUTO_INCREMENT,
    host_id INT,
    property_type VARCHAR(255),
    city VARCHAR(255),
    country VARCHAR(255),
    effective_date DATE,
    expiry_date DATE,
    current_flag BOOLEAN
);

Fact Tables

fact_bookings

CREATE TABLE fact_bookings (
    booking_id INT PRIMARY KEY AUTO_INCREMENT,
    user_id INT,
    listing_id INT,
    booking_date DATE,
    check_in_date DATE,
    check_out_date DATE,
    total_price DECIMAL(10,2),
    booking_status VARCHAR(20),
    review_by_user VARCHAR(255),
    review_by_host VARCHAR(255)
);

fact_booking_prices

CREATE TABLE fact_booking_prices (
    booking_id INT,
    date_id INT,
    price DECIMAL(10,2),
    PRIMARY KEY (booking_id, date_id)
);

Explanation:

  • Dimension Tables: Store reference data about users, hosts, and properties.

  • Fact Tables: Store transactional data related to bookings and their associated prices.

  • Relationships: Foreign keys connect the fact tables to dimension tables.

  • Data Types: Data types are chosen to match the expected values (e.g., int for IDs, varchar for text, date for dates).

Business Questions and Analysis

Below are some business questions our model can answer with simple SQL queries.

1. What is the average booking length?

WITH booking_duration AS (
    SELECT
        booking_id,
        DATEDIFF(check_out_date, check_in_date) AS duration
    FROM
        fact_bookings
)
SELECT
    AVG(duration) AS average_booking_length
FROM
    booking_duration;

2. Which cities are the most popular destinations for guests?

SELECT
    l.city,
    COUNT(*) AS total_bookings
FROM
    fact_bookings b
INNER JOIN dim_listings l ON b.listing_id = l.listing_id
GROUP BY
    l.city
ORDER BY
    COUNT(*) DESC;

3. What is the average price difference between weekdays and weekends for bookings in a specific city?

WITH booking_data AS (
    SELECT
        b.booking_id,
        d.day_of_week,
        bp.price
    FROM
        fact_bookings b
    INNER JOIN dim_dates d ON b.booking_date = d.date
    INNER JOIN fact_booking_prices bp ON b.booking_id = bp.booking_id
    WHERE
        l.city = 'New York City' -- Replace with your desired city
)
SELECT
    day_of_week,
    AVG(price) AS average_price
FROM
    booking_data
GROUP BY
    day_of_week;

4. Identify the top-rated hosts based on guest reviews.

WITH host_reviews AS (
    SELECT
        h.host_id,
        AVG(b.review_by_user) AS average_rating
    FROM
        dim_hosts h
    INNER JOIN fact_bookings b ON h.host_id = b.host_id
    WHERE
        b.review_by_user IS NOT NULL
    GROUP BY
        h.host_id
)
SELECT
    h.host_name,
    hr.average_rating
FROM
    dim_hosts h
INNER JOIN host_reviews hr ON h.host_id = hr.host_id
ORDER BY
    hr.average_rating DESC
LIMIT 5;

Thanks for reading!

Cover photo by Andrea Davis (via Pexels)

0
Subscribe to my newsletter

Read articles from Analytics Engineering Digest directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Analytics Engineering Digest
Analytics Engineering Digest

Analytics Engineer Digest is dedicated to sharing expert insights and practical knowledge in the field of analytics engineering. With a focus on data modeling, data warehousing, ETL/ELT processes, and the creation of high-quality datasets, this blog serves as a resource for building scalable data architectures that drive business success. Readers can explore real-world data models, interview preparation tips, and best practices for crafting metrics, dashboards, and automated workflows using SQL, Python, and AWS tools. Analytics Engineer Digest is committed to empowering data professionals to elevate their skills and unlock the full potential of their analytics projects.