So during one of my early NLP experiments, I came across a massive dataset of Amazon product reviews, and thought — can I train a model to automatically detect whether a review is positive, negative, or neutral?

I knew that many people had done sentiment analysis before. But I wanted to try a more detailed pipeline — combining classic NLP features like TF-IDF, and also test how embeddings like Word2Vec impact the results when used with different models — from SVMs and Perceptrons to full-on MLPs.

🧩 What I Wanted to Explore

The goal was to perform both:

Binary classification: Positive vs. Negative
Ternary classification: Positive, Neutral, Negative

And I wanted to answer a few questions:

Does custom Word2Vec beat pre-trained embeddings for specific product domains?
Can a simple Perceptron beat a deep network if the features are good?
How do dimensionality and preprocessing affect model performance?

🔧 What I Used

Python
Pandas, NumPy – data handling
NLTK – for text preprocessing
Scikit-learn – for TF-IDF, SVMs, Perceptron
Gensim – for training custom Word2Vec
PyTorch – for building the MLP models

🛠️ What I Did (Step-by-Step)

1️⃣ Data Preprocessing

Cleaned raw review text by:
- Removing HTML tags, links, punctuation
- Expanding contractions (e.g., "can’t" → "cannot")
- Lowercasing everything
- Lemmatizing and removing stopwords

2️⃣ Feature Extraction

TF-IDF Vectors: for classic ML models (SVM, Perceptron)
Word2Vec Embeddings:
- Used both pre-trained (Google News) and custom-trained on the Amazon review corpus
- Represented reviews by averaging all word vectors in each review

3️⃣ Model Training

Ran separate experiments for:
- Binary classification using SVM, Perceptron, MLP
- Ternary classification using MLP on 300D Word2Vec
Also experimented with dimensionality reduction (300D → 10D) to see impact

4️⃣ Evaluation

Used accuracy, precision, recall, F1-score
Plotted confusion matrices to analyze where models were confused (neutral vs. pos/neg)

📊 Results

Model	Accuracy	Notes
TF-IDF + SVM	~88%	Strong on binary classification
TF-IDF + Perceptron	~85%	Simpler, faster to train
Word2Vec (custom) + MLP (300D)	~91%	Best result overall ✅
Word2Vec (pre-trained) + MLP	~87%	Struggled with domain words
Word2Vec (10D) + MLP	~78%	Dropped due to loss of detail

🧠 Key Insight: Custom embeddings outperformed pre-trained — domain context matters!

💡 What I Learned

Preprocessing is everything in sentiment tasks — lemmatization and stopword removal helped models generalize better
Even small differences in word vector dimensionality can drastically affect results
TF-IDF is still insanely strong for linear models like SVMs
MLPs only help when paired with rich embeddings — raw input isn’t enough

🧠 Why This Project Still Feels Relevant

Sentiment analysis might sound basic, but it’s exactly what companies like Abnormal do at scale — detecting sentiment, tone, and intent in large amounts of textual input (like emails or messages). This kind of modeling also lays the foundation for understanding behavioral anomalies or emotional cues in language.

🧪 Future Ideas

Plug in BERT-style embeddings and compare results
Add attention layers to make the model context-aware
Turn it into a real-time review classifier web app using Streamlit

✉️ Want to Try This or Discuss NLP Pipelines?

Always down to collaborate, debug, or brainstorm better approaches. Especially if you're into text understanding or applying NLP in real-world systems.

📩 LinkedIn | 🔗 GitHub

📦 How I Used TF-IDF, Word2Vec, and MLPs to Analyze Sentiment in Amazon Reviews