LyricsAnalyzer : Song Analysis with NLP
Structure
Introduction
The Mission
Building the NLP Pipeline
Regression Modelling
Project Pipeline and Future Scope
Conclusion
Introduction
Being involved in music and machine learning, I have always been intrigued by the integration of these two. In my last project, I have analyzed Hindi song lyrics with NLP using the acoustic features as labels. The aim was to forecast different sound characteristics of the song completely based on the lyrics.
The Mission
As we know, genres are subjective to our perception which creates problems. What’s “rock” to one person could be “indie” to another. Thus, I decided to focus on measurable aspects of music that were acousticness, danceability, and energy. These labels are less prone to personal bias which solves our issue.
Building the NLP Pipeline
The first phase of the project was designing a NLP pipeline that captures the essence and meaning of the song from its lyrics. The idea was simple: find the lyrics, vectorize them using proper techniques, and use machine learning models to predict acoustic properties. Lyrics Genius and Musixmatch became my go-to resources for gathering lyrics, though they came with their limitations. While the APIs provided access to the lyrics, Musixmatch only allowed 30% visibility, a roadblock.
This majorly decreases our data but we train on the limited lyrics since the dataset is big; however, API could be used for professional purposes(Though MusixMatch at this point does not provide support for companies or projects that are in their research and development phase or personal projects).
Regression Modelling
I built a regression model capable of predicting specific, quantitative musical attributes with the challenge being to forecast an exact value for a given song based on its lyrics.
To implement this, I started by using the vectorized lyrics from my NLP pipeline as inputs for the model. The target outputs were the acoustic features of the songs, which I had extracted from the dataset.
After some experimentation, I settled on using models such as linear regression, Ridge regression and gradient-boosting algorithms. The linear model helped establish a baseline, giving an understanding of how lyrics and acoustic features might correlate.
I split my data into training and test sets, with the aim of making sure the model generalizes well to unseen data. My hypothesis was that certain patterns in the lyrical content could be linked to musical properties. For example, songs with calmer, more introspective lyrics could trend towards higher acousticness, while energetic and upbeat-type lyrics might signal higher danceability and energy.
Model evaluation involved measuring the root mean squared error (RMSE) and mean absolute error (MAE) to assess prediction accuracy. By using regression modelling, I aimed to provide a more objective understanding of songs.
Project Pipeline and Future Scope
In the project, the vectorized lyrics, once finalized, would be fed into the model to predict the three labels. Each of these labels is on a spectrum, and the goal is to predict the exact point on this spectrum for each track. The model had to learn from the language and context of the lyrics to understand what kind of acoustic properties the track might have.
Now, predicting acousticness, danceability, and energy from just the lyrics is a challenge in itself. Lyrics often set the emotional tone for a song but don't always directly correspond to the production or composition style. This project could, thus, be integrated with my last project on genre classification using MFCCs. In this way, both the musical features and lyrics could be used to estimate the quantitative features more accurately.
Conclusion
We built a tool that can analyze songs based on measurable attributes rather than subjective genres. I believe this project can be more accurate music discovery and recommendation systems, ones that are driven by data rather than subjectivity. The ultimate goal is to enable music recommendation systems to offer suggestions based on measurable attributes.
Subscribe to my newsletter
Read articles from Advait directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Advait
Advait
Passionate AI Dev | Proficient in Deep Learning | GenAI Enthusiast