Madrid Real Estate Investment Analysis


π Overview
In this project, I analyze short-term rental investment opportunities in Madrid using Python, geospatial analytics, and Power BI.
By combining Airbnb listings, property pricing data from Idealista, and official geospatial boundaries, the analysis identifies high-yield neighborhoods, occupancy trends, and investment hotspotsβall with a data-first mindset.
π GitHub repository
π― Objectives
Analyze available public data sources
Find insights that help understand the characteristics of the market in this city
Guide the investment teamβs search efforts
Especially regarding the main focus areas: rental prices, occupancy levels, and purchase prices
Estimate and compare rental yield by district
Build a Power BI dashboard to explore key investment drivers
Identify undervalued districts with high potential returns
π Business Assumptions & KPIs
The profitability of short-term rentals relies primarily on three levers:
Rental price β the higher the nightly rate, the higher the income
Occupancy β the more days the property is rented, the better
Purchase price β the cheaper the acquisition, the higher the ROI
The KPIs we use in this analysis are:
Occupancy rate β computed as the % of the year the property is not available (proxy for booked days)
Nightly rental price β directly from Airbnb listings
Estimated property value β based on average β¬/mΒ² from Idealista, adjusted with a 25% discount for negotiation leverage
Each KPI is engineered from raw data or external sources using Python.
π§ Tools & Techniques
Python: pandas, numpy, matplotlib, seaborn, sqlalchemy, folium
SQLite database construction from raw CSVs
Geospatial analysis using latitude/longitude coordinates and distance calculations
Power BI for final business-friendly dashboard
π Executive Summary & Visual Highlights
The interactive dashboard summarizes over 10,000 listings and uncovers pricing, occupancy, and profitability patterns across Madridβs 21 districts.
π§Ύ Download the full Power BI dashboard PDF
π‘ Top Districts by Rental Yield (%)
Based on the dashboard summary:
Villaverde β 17.92%
Puente de Vallecas β 17.18%
San Blas β Canillejas β 16.36%
β These districts offer the highest rental profitability, combining low acquisition costs with solid demand signals.
π Location Impact β Does Centrality Guarantee Performance?
Proximity to Puerta del Sol does not guarantee higher occupancy or pricing.
Even well-located units underperform if poorly optimized.
π§ Location matters β but only when combined with quality, affordability, and guest fit.
πΆ Occupancy vs Purchase Price by District
Lower-cost districts show better rental efficiency.
- San Blas β Canillejas stands out due to event-driven demand (IFEMA, Wanda Stadium)
π§ Property Type Insights β What Drives Occupancy?
πΌ Guest Capacity and Price Dynamics
Listings optimized for 4 guests yield the highest total rental prices relative to purchase cost.
π Larger groups donβt always yield better ROI β the 4-guest configuration hits the sweet spot.
π Bedrooms & Beds & Accommodates & Room Type
1-bedroom apartments reach the highest occupancy rates, confirming demand from solo travelers and couples
Listings with 1β2 beds perform best. Adding more beds often reduces occupancy, likely due to inefficiencies in layout or pricing flexibility
Occupancy peaks at 1β3 guests accommodated
Private rooms achieve the highest utilization overall
π§ Strategic Recommendation
Madrid presents a spectrum of opportunities:
High-end positioning in Salamanca (blue dot)
Mid-tier stability in Moncloa β Aravaca (orange dot)
High-potential emerging zone in San Blas β Canillejas (green dot)
β Leverage rental volatility for short-term gains in event-active areas
π Geospatial Analysis
Although a GeoJSON file was available, mapping was performed using listing-level coordinates (latitude and longitude). Distance from Puerta del Sol was used as a centrality proxy (see formula in the next section).
π§Ύ Methodology
π 1. Data Sources
Official GeoJSON for Madrid districts
π οΈ 2. Data Engineering & Datamart Creation
We created a local SQLite database (airbnb.db
) from CSV files containing listings, availability, reviews, and price data.
We then:
Performed individual quality checks:
Variable types
Null handling via custom imputation functions
Duplicate detection
Categorical and numerical profiling
Built the analytical datamart by joining cleaned tables
Prepared analysis variables:
price
(validated per room type)occupancy
derived fromavailability_365
purchase_price
estimated from Idealistaβs β¬/mΒ²
π 3. Feature Engineering Examples
ποΈ Adjusted Total Price
def create_total_price(record):
if (record.beds > 1) & ((record.room_type == 'Private room') | (record.room_type == 'Shared room')):
return record.price * record.beds * 0.8
return record.price
df['total_price'] = df.apply(create_total_price, axis=1)
π Occupancy Transformation
df['occupancy'] = ((365 - df.availability_365) / 365 * 100).astype('int')
π’ Estimating Property Size via Bedrooms
conditions = [
df.bedrooms == 1,
df.bedrooms == 2,
df.bedrooms == 3,
df.bedrooms == 4,
df.bedrooms > 4
]
results = [50,70,90,120,150]
df['m2'] = np.select(conditions, results, default = -999)
π° Purchase Price Estimate
df['purchase_price'] = df.m2 * df.price_m2 * 0.75
π Distance to Puerta del Sol (Haversine)
from math import radians, cos, sin, asin, sqrt
def haversine(lat1, lon1, lat2, lon2):
R = 6372.8 # In kilometers β if you're using miles, replace it with 3959.87433
dLat = radians(lat2 - lat1)
dLon = radians(lon2 - lon1)
lat1 = radians(lat1)
lat2 = radians(lat2)
a = sin(dLat/2)**2 + cos(lat1)*cos(lat2)*sin(dLon/2)**2
c = 2*asin(sqrt(a))
return R * c
# The coordinates of Puerta del Sol will be lat1 and lon1
lat1 = 40.4167278
lon1 = -3.7033387
df['poi_sol'] = df.apply(lambda record: haversine(lat1,lon1,record.latitude,record.longitude),axis = 1)
π§ 4. Data Enrichment & Discretization
To better capture relationships, we discretized numerical features:
accommodates
,beds
,bedrooms
,number_of_reviews
This allowed more robust exploratory visualizations and performance groupings.
π€ 5. Output Dataset for Power BI
After data cleaning and feature engineering, we executed the notebook RE_Analysis_Madrid_Executive_Summary.ipynb
, which:
Selects a subset of cleaned and enriched features from the local database
Generates the final CSV file:
RE_Analysis_Madrid_dataset.csv
This dataset serves as the input for the Power BI dashboard and includes:
Cleaned rental-related variables
Geospatial features like
poi_sol
(distance to Sol)All derived KPIs for investment analysis
π Business Dashboard
The dashboard was designed in Power BI to communicate key insights to business users in a simple, interactive format.
π Main Pages:
π Investment Outlook
π Market and Property Drivers
π Location Impact (Distance to Puerta del Sol)
ποΈ Occupancy Drivers by Property and Price
π‘ District-level Investment Outlook
π Example dashboard preview:
π§Ύ Download the full PDF version
π Insights & Recommendations
The most profitable opportunities arise in peripheral districts where acquisition costs remain low but demand remains solid β such as Villaverde, Puente de Vallecas, and San Blas β Canillejas
Simplicity outperforms luxury β listings with 1β2 beds and basic layouts achieve higher occupancy rates than oversized or overdesigned properties
Room type is a key differentiator: private rooms and 1-bedroom units consistently deliver higher utilization, especially among solo travelers and couples
Proximity to city center is not a guarantee of performance β distance metrics like proximity to Puerta del Sol must be paired with KPIs such as yield and occupancy to detect arbitrage zones
Geospatial analysis and pricing segmentation reveal patterns invisible to spreadsheets β enabling smarter, location-informed investment strategies
π§ What I Learned
How to build a full data-to-insight pipeline for real estate β from raw public data to BI-ready dashboards
How to combine Python analytics and Power BI storytelling to deliver actionable insights for investment decisions
How to engineer domain-specific KPIs (e.g., yield, occupancy) and apply them in geospatial, financial, and behavioral contexts
How to translate complex datasets into clear narratives that bridge the gap between data science and business needs
π‘ This project helped me sharpen my ability to deliver business-ready insights from raw, messy data β a skill at the core of real-world data science.
π Explore the project
π GitHub repo
π Part of the Data Projects Portfolio
Subscribe to my newsletter
Read articles from Tayfur Akkaya Clavijo directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Tayfur Akkaya Clavijo
Tayfur Akkaya Clavijo
I'm an AI, data and finance enthusiast. Passionate about lifelong learning and cross-sector innovation, Iβm always looking to collaborate on impactful projects at the intersection of data, strategy, and technology.