πŸ“¦ How I Used TF-IDF, Word2Vec, and MLPs to Analyze Sentiment in Amazon Reviews

Khushal JhaveriKhushal Jhaveri
3 min read

So during one of my early NLP experiments, I came across a massive dataset of Amazon product reviews, and thought β€” can I train a model to automatically detect whether a review is positive, negative, or neutral?

I knew that many people had done sentiment analysis before. But I wanted to try a more detailed pipeline β€” combining classic NLP features like TF-IDF, and also test how embeddings like Word2Vec impact the results when used with different models β€” from SVMs and Perceptrons to full-on MLPs.


🧩 What I Wanted to Explore

The goal was to perform both:

  • Binary classification: Positive vs. Negative

  • Ternary classification: Positive, Neutral, Negative

And I wanted to answer a few questions:

  • Does custom Word2Vec beat pre-trained embeddings for specific product domains?

  • Can a simple Perceptron beat a deep network if the features are good?

  • How do dimensionality and preprocessing affect model performance?


πŸ”§ What I Used

  • Python

  • Pandas, NumPy – data handling

  • NLTK – for text preprocessing

  • Scikit-learn – for TF-IDF, SVMs, Perceptron

  • Gensim – for training custom Word2Vec

  • PyTorch – for building the MLP models


πŸ› οΈ What I Did (Step-by-Step)

1️⃣ Data Preprocessing

  • Cleaned raw review text by:

    • Removing HTML tags, links, punctuation

    • Expanding contractions (e.g., "can’t" β†’ "cannot")

    • Lowercasing everything

    • Lemmatizing and removing stopwords

2️⃣ Feature Extraction

  • TF-IDF Vectors: for classic ML models (SVM, Perceptron)

  • Word2Vec Embeddings:

    • Used both pre-trained (Google News) and custom-trained on the Amazon review corpus

    • Represented reviews by averaging all word vectors in each review

3️⃣ Model Training

  • Ran separate experiments for:

    • Binary classification using SVM, Perceptron, MLP

    • Ternary classification using MLP on 300D Word2Vec

  • Also experimented with dimensionality reduction (300D β†’ 10D) to see impact

4️⃣ Evaluation

  • Used accuracy, precision, recall, F1-score

  • Plotted confusion matrices to analyze where models were confused (neutral vs. pos/neg)


πŸ“Š Results

ModelAccuracyNotes
TF-IDF + SVM~88%Strong on binary classification
TF-IDF + Perceptron~85%Simpler, faster to train
Word2Vec (custom) + MLP (300D)~91%Best result overall βœ…
Word2Vec (pre-trained) + MLP~87%Struggled with domain words
Word2Vec (10D) + MLP~78%Dropped due to loss of detail

🧠 Key Insight: Custom embeddings outperformed pre-trained β€” domain context matters!


πŸ’‘ What I Learned

  • Preprocessing is everything in sentiment tasks β€” lemmatization and stopword removal helped models generalize better

  • Even small differences in word vector dimensionality can drastically affect results

  • TF-IDF is still insanely strong for linear models like SVMs

  • MLPs only help when paired with rich embeddings β€” raw input isn’t enough


🧠 Why This Project Still Feels Relevant

Sentiment analysis might sound basic, but it’s exactly what companies like Abnormal do at scale β€” detecting sentiment, tone, and intent in large amounts of textual input (like emails or messages). This kind of modeling also lays the foundation for understanding behavioral anomalies or emotional cues in language.


πŸ§ͺ Future Ideas

  • Plug in BERT-style embeddings and compare results

  • Add attention layers to make the model context-aware

  • Turn it into a real-time review classifier web app using Streamlit


βœ‰οΈ Want to Try This or Discuss NLP Pipelines?

Always down to collaborate, debug, or brainstorm better approaches. Especially if you're into text understanding or applying NLP in real-world systems.

πŸ“© LinkedIn | πŸ”— GitHub

0
Subscribe to my newsletter

Read articles from Khushal Jhaveri directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Khushal Jhaveri
Khushal Jhaveri