Movie Recommendation System

In this article, I discuss my recent project, a Movie Recommendation System. The system recommends movies based on a user’s input movie title. To determine similarity, I used Cosine Similarity and Pearson Correlation to compare relationships between movies (or users) and generate relevant recommendations.

For this project, I used two types of datasets.

Movie Dataset: Contains the movie ID, title, and genre.
Ratings Dataset: Contains the user ID, movie ID, and the rating each user gave to multiple movies.

By combining these datasets, I was able to link movie details with their corresponding user ratings, which served as the foundation for building the recommendation system.

I used TfidfVectorizer from sklearn.feature_extraction.text to convert the cleaned movie titles into numerical form using TF-IDF (Term Frequency–Inverse Document Frequency).

pythonCopyEditfrom sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(ngram_range=(1, 2))
tfidf = vectorizer.fit_transform(movies["clean_title"])

ngram_range=(1, 2) means the vectorizer considers both single words (unigrams) and pairs of consecutive words (bigrams) when transforming the text into numbers.
The resulting tfidf matrix represents each movie title as a set of numerical values, allowing us to calculate similarity between titles.

Before performing any of the recommendation steps, I carried out a data cleaning process. This involved removing any characters other than alphabets and numbers from the dataset. For this, I used regular expressions (Regex) to ensure the text was clean and consistent for processing.

import re
def clean_title(title):
    return re.sub("[^a-zA-Z0-9 ]","",title)

Based on these datasets, the recommendation process works as follows:

When a user enters a specific movie, I first select all the users who have watched that movie.
From those users, I create a list of movies they have rated 4 or above. This helps filter out lower-rated movies and focus on high-quality recommendations.
This results in a shorter list of movies, which I then refine further by keeping only the movies that are common among multiple users.
Finally, I calculate a similarity score for each remaining movie (using Cosine Similarity or Pearson Correlation) to rank and recommend the most relevant ones.

In addition, I used Jupyter Notebook’s built-in widgets to create an interactive interface. This allowed users to search for a movie and instantly view the top 10 recommended movies based on their input.

GitHub Repository:https://github.com/Mehar-Zaman-Ali/Movie-Recommendation-System

Movie Recommendation System

Subscribe to my newsletter

Zaman Ali

Zaman Ali