Uncovering the Formula Behind Spotify's Top 100 Songs of 2024


Introduction
In the rapidly evolving landscape of digital music, streaming platforms like Spotify play a crucial role in shaping musical tastes and cultural trends. With millions of users and billions of streams, Spotify’s charts provide a valuable snapshot of what resonates with listeners globally. This project delves into the Spotify Top 100 Songs of 2024 to uncover insights into song popularity, distribution, artist presence, and the impact of features such as explicit content and song duration.
Problem Statement
The primary question driving this analysis is: What makes a song popular on Spotify in 2024? While traditional metrics like album sales and radio play have taken a backseat, the dynamics of streaming popularity remain complex. This project aims to demystify those dynamics by analyzing key attributes of the top-charting songs and understanding the underlying trends.
Data Collection
To begin, I utilized the Spotify Web API and the Spotipy Python library to extract information about songs released in 2024. I fetched data for 500 tracks and filtered the top 100 songs based on Spotify’s popularity score. The resulting dataset was uploaded to Google BigQuery, enabling scalable cloud-based querying and integration with Google Colab for further analysis.
Attributes Extracted
Track name
Artist name
Album name and type
Release date and year
Popularity score (0-100)
Explicit content indicator
Duration (in milliseconds)
Track and disc numbers
Number of contributing artists
Available markets (count)
Spotify URL
Genres
Tools Used
Python for scripting and data analysis
Spotipy for Spotify API access
Pandas for data manipulation
Google BigQuery for cloud-based data storage and querying
Google Colab for interactive notebooks
Matplotlib and Seaborn for data visualization
Excel for final touch-ups and time conversions
Data Cleaning & Processing
After fetching the data:
I removed duplicates and incomplete records
Converted duration from milliseconds to minutes and seconds
Extracted release year and cleaned unknown genres
Standardized column names
Uploaded the cleaned CSV to BigQuery
Queried the data in Google Colab using
pandas-gbq
for further exploration
Genres were enriched by querying the artist endpoint. In cases where genre metadata was missing, I manually added appropriate genres based on known artist profiles.
Exploratory Data Analysis
Key Visualizations
Bar Chart of Top Artists by Frequency
- Billie Eilish and Sabrina Carpenter topped the list with 7 appearances each.
# Create a bar plot showing the top 10 most frequent artists.
top_artists = df['Artist'].value_counts().head(10)
plt.figure(figsize=(12, 6))
sns.barplot(x=top_artists.index, y=top_artists.values, palette='viridis')
plt.xlabel('Artist')
plt.ylabel('Number of Songs in Top 100')
plt.title('Top 10 Most Frequent Artists in Spotify Top 100 (2024)')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
Box Plot of Popularity by Album Type
- Singles and albums have similar medians, with singles showing more variability.
# Bivariate Analysis: Box plot of Popularity by Album Type
plt.figure(figsize=(8, 5))
sns.boxplot(data=df, x='Album_Type', y='Popularity', palette='viridis')
plt.xlabel('Album Type')
plt.ylabel('Popularity')
plt.title('Popularity Distribution by Album Type in Spotify Top 100 (2024)')
plt.show()
Scatter Plot: Popularity vs Duration
- No strong linear correlation; optimal length around 2.5 to 4 minutes.
# Convert duration from milliseconds to minutes
df['Duration_min'] = df['Duration_ms'] / 60000
# Scatter plot
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df, x='Duration_min', y='Popularity', hue='Explicit')
plt.title('Popularity vs Duration (minutes)')
plt.xlabel('Duration (minutes)')
plt.ylabel('Popularity')
plt.show()
Scatter Plot: Popularity vs Available Markets
- Most tracks are available in ~180 markets. No direct correlation with popularity.
# Bivariate Analysis: Popularity vs Available Markets
plt.figure(figsize=(8, 5))
sns.scatterplot(data=df, x='Available_Markets', y='Popularity', hue='Explicit')
plt.title('Popularity vs Available Markets')
plt.xlabel('Available Markets')
plt.ylabel('Popularity')
plt.legend(title='Explicit')
plt.grid(True)
plt.show()
Donut Chart: Explicit vs Clean Songs
- 64% clean and 36% explicit songs, showing a slight lean toward mainstream content.
df['Explicit'] = df['Explicit'].astype(str) # Convert to string for plotting
df['Explicit'].value_counts().plot(kind='pie', autopct='%1.1f%%', labels=['Clean', 'Explicit'], title='Explicit vs Clean Songs', figsize=(6,6))
plt.ylabel('')
plt.show()
Bivariate & Multivariate Analyses
Popularity vs Available Markets
Popularity vs Track Number
Popularity vs Genre
Multivariate scatter plots combining Explicit, Duration, and Market Reach
Key Insights
Artist Dominance: A small number of artists dominate the chart with multiple entries.
No Length Bias: Both short and long songs can be popular; no clear advantage based on duration.
Global Availability: Most top songs are globally distributed, but a few niche tracks gain popularity with limited reach.
Content Matters: Clean songs slightly edge out explicit ones at the highest popularity levels.
Genre Patterns: Pop and hip-hop dominate, with recurring sub-genres like R&B and alt-pop.
Impact and Value
This analysis helps:
Music Marketers understand what makes a track globally appealing
Artists optimize release formats and content strategies
Streaming Platforms refine recommendation systems and playlist placements
Academics and Researchers explore correlations between song features and virality
Challenges
API limitations restricted access to full audio feature data
Genre classification inconsistencies required manual correction
Popularity is time-sensitive and can change rapidly week to week
Future Work
Integrate TikTok virality metrics
Add audio features (tempo, valence, etc.) with higher rate limits
Build an interactive dashboard using Plotly or Streamlit
Analyze listener demographics and regional trends
Conclusion
This project provides a detailed look at the structure and patterns behind Spotify’s most popular songs of 2024. While there is no single formula for success, trends around artist frequency, song length, global availability, and genre highlight the nuanced nature of music popularity in the digital age.
For questions or collaboration, contact: abyogia@gmail.com
Subscribe to my newsletter
Read articles from Abdüllahy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
