A Quantum Leap in Audio Classification

Yash ThakarYash Thakar
5 min read

It all started one afternoon with a lively debate for a research project with our Head of Department, and she tossed out a wild idea: "What if we research on Quantum Computing and its applications in the domain of Machine Learning?". The room fell silent for a moment before erupting into arguments, nervousness and excitement, the excitement that comes with chasing the unknown. It was a lofty proposition, sure, but the more we talked about it, the more it seemed worth exploring. And thus began our journey into the world of Quantum Machine Learning. Initial literature review indicated that Quantum computing, with its ability to leverage quantum mechanical phenomena like superposition and entanglement, offers a potential solution to some of the computational challenges faced in processing high-dimensional data.

The Problem with Audio Data

If you've ever tried to classify audio signals, you'll know they’re no walk in the park. Audio data is messy— highly complex, multi-dimensional, and full of hidden patterns that can be maddeningly difficult to extract. Traditionally, spectrograms (those colorful frequency-vs-time plots you might have seen in audio processing tutorials) have been our go-to representation for wrangling this complexity.

A key step in our process was converting the raw audio signals into mel-spectrograms. These visual representations apply a non-linear transformation (the Mel scale) to the frequency axis, mimicking human auditory perception. This transformation helps in extracting salient features from complex audio data, making it easier for the neural network to learn and classify.

But as powerful as classical CNNs have been in processing spectrograms, we wanted more—better accuracy, improved generalization, and maybe even a dash of quantum magic.

A Wild Idea Takes Shape

Quantum computing, with its superposition, entanglement, and general aura of wizardry, felt like a natural (if slightly intimidating) candidate to tackle high-dimensional problems like audio classification. Our hypothesis was simple yet daring: if classical CNNs could process spectrograms efficiently, perhaps adding a quantum layer could unlock even greater potential.

We didn’t know if it would work, but that was half the fun.

Building the Quantum-Classical CNN

Figure 1: (a) Quantum Circuit used for the Quantum Layer, (b) the the RealAmplitudes circuit

The blueprint for our hybrid model was a marriage of the familiar and the experimental:

  1. Classical CNN Core: The model starts with a classical CNN featuring three convolutional layers, each followed by ReLU activation and max-pooling. This part of the network is responsible for initial feature extraction from the input spectrograms.

  2. Quantum Neural Network (QNN) Layer: The output from the classical layers is then fed into a quantum circuit. This circuit uses 8 qubits and is composed of two main parts:

    1. A FeatureMap circuit (specifically, a ZZFeatureMap) that encodes classical data into a quantum state.

    2. An ansatz circuit (RealAmplitudes) that applies quantum operations to process the encoded data.

  3. Final Classification: The output from the quantum layer is processed by a final fully connected layer to produce the classification result.

To bring our vision to life, we turned to Qiskit, IBM’s open-source quantum computing framework. More specifically, we used Qiskit’s Torch Connector, a versatile class designed to plug the quantum circuit directly into a PyTorch model, treating it as a differentiable layer. This way, the entire hybrid model—classical CNN and quantum layer—could be trained end-to-end using backpropagation.

This hybrid approach aims to leverage the strengths of both classical and quantum computing paradigms. It felt like we were Frankenstein-ing together two worlds, and we were equal parts nervous and thrilled to see if the model would come alive.

The Test: Two Datasets, Two Realities

We put our QC-CNN to the test with two datasets:

  • GTZAN Genre Classification Dataset: A small dataset with 10 music genres (100 files per genre).

  • Birdsong Dataset: A larger, more complex dataset with 1,000 audio samples per bird species.

Results and Analysis

We compared the performance of our QC-CNN against a classical CNN architecture across various metrics including accuracy, precision, recall, F1-score, and cross-entropy loss. The results were intriguing:

Small Dataset (GTZAN)
  • Our QC-CNN achieved a training accuracy of comparable to the classical CNN's, However, the QC-CNN struggled with generalization exhibiting a higher cross-entropy loss, indicating less confident predictions.
Large Dataset (Birdsong)
  • On the larger dataset, our QC-CNN, achieved a better test accuracy compared to the classical CNN's and also showed improved generalization, suggesting more confident predictions.

Lessons Learned (and a Few Headaches)

  1. Size Matters: Our QC-CNN's performance improved dramatically with a larger dataset. This was a key takeaway — quantum models seem to love data, the more, the better.

  2. Overfitting? Not Today: The hybrid model generalized better on larger datasets, showing reduced overfitting compared to the classical CNN, indicating better generalization capabilities.

However, Confidence Is Key, While accuracy was high, our QC-CNN’s predictions on smaller datasets felt shaky. Higher cross-entropy loss hinted that the model was often unsure, even when it was correct. But, The improved performance on the larger, more complex dataset hints at the QC-CNN's potential for handling high-dimensional and intricate audio data. For detailed results feel free to read the research.

The Future Looks Quantum

As we celebrated our initial results, we couldn’t help but look ahead. There’s so much more to explore:

  • Could we develop a quantum-specific explainable techniques that could provide valuable insights into the decision-making process of QC-CNNs?

  • Could a purely quantum approach outperform hybrids?

  • Can we extend these hybrid methods to tasks like speech recognition or music generation?

Of course, challenges remain. Quantum Machine Learning is still in its infancy, and debugging quantum circuits is a unique kind of headache. But the potential is undeniable. We believe quantum computing will soon unlock new dimensions (literally) in machine learning.

A Final Note

When we started this project, it felt like reaching for the stars. Now, after seeing what’s possible, it feels like we’ve just cracked open the door to a new frontier. After months of rigorous research, our paper, “Performance Analysis of Hybrid Quantum-Classical Convolutional Neural Networks for Audio Classification,” was accepted at the 15th International Conference on Computing, Communication, and Networking Technologies (ICCCNT) at IIT Mandi, India and published in IEEE Xplore. The moment we received the acceptance email was surreal — a mix of disbelief and pure joy. Presenting our findings to an audience of researchers and academics was the culmination of our journey, and seeing them resonate with our work was incredibly rewarding.

Feel free to read the research at: https://doi.org/10.1109/ICCCNT61001.2024.10725668

1
Subscribe to my newsletter

Read articles from Yash Thakar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Yash Thakar
Yash Thakar