Movie Recommendation System


In this article, I discuss my recent project, a Movie Recommendation System. The system recommends movies based on a user’s input movie title. To determine similarity, I used Cosine Similarity and Pearson Correlation to compare relationships between movies (or users) and generate relevant recommendations.
For this project, I used two types of datasets.
Movie Dataset: Contains the movie ID, title, and genre.
Ratings Dataset: Contains the user ID, movie ID, and the rating each user gave to multiple movies.
By combining these datasets, I was able to link movie details with their corresponding user ratings, which served as the foundation for building the recommendation system.
I used TfidfVectorizer
from sklearn.feature_extraction.text
to convert the cleaned movie titles into numerical form using TF-IDF (Term Frequency–Inverse Document Frequency).
pythonCopyEditfrom sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(ngram_range=(1, 2))
tfidf = vectorizer.fit_transform(movies["clean_title"])
ngram_range=(1, 2)
means the vectorizer considers both single words (unigrams) and pairs of consecutive words (bigrams) when transforming the text into numbers.The resulting
tfidf
matrix represents each movie title as a set of numerical values, allowing us to calculate similarity between titles.
Before performing any of the recommendation steps, I carried out a data cleaning process. This involved removing any characters other than alphabets and numbers from the dataset. For this, I used regular expressions (Regex) to ensure the text was clean and consistent for processing.
import re
def clean_title(title):
return re.sub("[^a-zA-Z0-9 ]","",title)
Based on these datasets, the recommendation process works as follows:
When a user enters a specific movie, I first select all the users who have watched that movie.
From those users, I create a list of movies they have rated 4 or above. This helps filter out lower-rated movies and focus on high-quality recommendations.
This results in a shorter list of movies, which I then refine further by keeping only the movies that are common among multiple users.
Finally, I calculate a similarity score for each remaining movie (using Cosine Similarity or Pearson Correlation) to rank and recommend the most relevant ones.
In addition, I used Jupyter Notebook’s built-in widgets to create an interactive interface. This allowed users to search for a movie and instantly view the top 10 recommended movies based on their input.
GitHub Repository:https://github.com/Mehar-Zaman-Ali/Movie-Recommendation-System
Subscribe to my newsletter
Read articles from Zaman Ali directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
