How I Merged My Love for Music and Coding in One Project: The Story of Building Raga-Match.

RittikaRittika
5 min read

Music has always been an integral part of life for me. The intricate melodies of Hindustani classical music have intrigued me since childhood, and over the years, my enjoyment of the subtle nuances of raagas, and the way that they can elicit certain feelings and emotions, flourished. However, my life took a turn after I began coding. I ultimately fell in love with programming, AI, and machine learning, but somewhere in my brain, I always wanted to combine the two interests.

That’s when I had an idea: What if I could build an AI-powered tool that could recognize Hindustani classical ragas from any given audio? Something like Shazam, but for classical Indian music.

The Spark That Led to Raga-Match

I started researching how music recognition works, digging into the underlying technology behind apps like Shazam. The key to their success was something called spectrogram-based audio fingerprinting, where an audio file is broken down into its frequency components over time, creating a visual representation of sound.

But Hindustani classical music is very different from mainstream Western music. Unlike fixed-scale compositions, raagas are fluid, with intricate ornamentations (meend, gamak) and variations in tempo (layakari). This made the challenge even more exciting: I couldn’t just replicate Shazam’s algorithm; I had to tailor it specifically for classical music.

The Data Problem: No Dataset, No Problem?

Unlike Western music, which has well-documented datasets, Hindustani classical music doesn’t have a structured, easily accessible dataset for raga recognition. This was my first major roadblock. I had to manually collect audio samples, look up recording snippets, and organize them all into a single dataset.

At first, it seemed like a simple task—until I started encountering audio encoding errors while processing the files. Different recording formats, metadata issues, and even missing audio links forced me to update file paths multiple times. But after weeks of trial and error, I finally compiled a working dataset, which I later uploaded to Kaggle. You can check it by following the link below:
🎵Hindustani Classical Music Raga Dataset

The First Roadblock: Dealing with Noisy Audio

One of the first problems I encountered was the quality of audio recordings. Unlike studio-produced songs, many Hindustani classical performances are recorded live, sometimes with background noise or accompanying instruments overpowering the vocals.

To tackle this, I integrated noise reduction techniques using Librosa and SciPy. I applied:

✔ Preprocessing filters to remove background noise

✔ Normalization techniques to balance volume levels

✔ Spectrogram generation to highlight important frequency patterns

Building the Model: CNN + LSTMs for Raga Identification

Once I had the dataset, it was time to train the AI model. Hindustani classical music is vastly different from Western music because:
Ragas evolve over time, unlike pop songs with fixed chord progressions.
Ornamentations like meend, gamak, and tanpura drones make feature extraction difficult.
There are no "exact" labels for ragas, as performances often mix elements of multiple ragas.

Instead of using traditional audio fingerprinting, I decided to build a deep learning model that could analyze Mel-Spectrograms and MFCCs (Mel-Frequency Cepstral Coefficients). These are visual representations of sound, making them perfect for a Convolutional Neural Network (CNN) + LSTM (Long Short-Term Memory) model.

  • CNN layers helped extract spatial features from the spectrograms.

  • LSTMs captured temporal dependencies in how a raga unfolds over time.

After multiple training iterations, the model performed shockingly well! The .pth file containing the trained model gave results that were too good to be true. But, AI models performing well in theory doesn’t mean they’ll work perfectly in real-world applications—the real challenge was yet to come.

Real-Time Identification: The Web App Challenge

Once the model started making decent predictions, the next step was to build an interactive system. I envisioned a web app with two features:
🎵 Upload an audio file → The app processes it and returns the raga name.
🎙 Record live singing → The app analyzes the real-time input and classifies the raga.

To achieve this, I built the backend using FastAPI, a high-performance web framework that supports Web Sockets for real-time updates. The biggest challenge? Live audio streaming and processing. Unlike static files, a live recording had to be broken into chunks, processed on the go, and classified in real time without delays. This required fine-tuning buffer sizes and optimizing the Fast Fourier Transform (FFT) calculations.

The Flask API Nightmare and WebSockets Struggles

The next step was to integrate the model into a Flask API so it could be accessed via a web application. I planned a Next.js frontend where users could:
Upload an audio file to get the raga name.
Record live singing, and the AI would identify the raga in real time.

I expected this to be straightforward, but I underestimated the difficulties of handling real-time audio processing with WebSockets. Flask wasn’t cooperating, and setting up WebSockets for live audio streaming became a major headache.

Real-time latency issues made it hard to analyze continuous audio input.
Buffering problems caused unexpected delays in raga identification.
Deployment challenges meant Flask and Next.js weren’t communicating properly.

First look of the web-app

Still a Work in Progress... But Closer Than Ever!

Even though I faced several roadblocks, Raga-Match is finally taking shape. The AI model works well, and I’ve successfully preprocessed the dataset, trained the CNN-LSTM model, and integrated the API—but live audio analysis is still a work in progress.

I’m currently experimenting with FastAPI instead of Flask, optimizing WebSockets, and fine-tuning real-time inference. The ultimate goal is to deploy this as a fully functional AI-powered web app, making Hindustani classical music recognition accessible to everyone.

Final Thoughts: The Beauty of Music and AI

Building Raga-Match has been an incredible experience. It started as a simple idea—combining my love for music and coding—but turned into a massive learning journey in AI, data collection, and web app deployment.

Despite all the struggles, seeing the model correctly classify ragas feels like magic. I still have a long way to go, but one thing’s for sure: Music and AI belong together, and this is just the beginning!

0
Subscribe to my newsletter

Read articles from Rittika directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Rittika
Rittika