Build a Movie Recommendation System Using Python and Machine Learning

In the age of streaming services and vast movie libraries, finding a movie that suits your tastes can be overwhelming. A personalized movie recommendation system can help users discover films they are likely to enjoy based on their preferences and past viewing history. In this blog post, we will walk through the process of building a movie recommendation system using Python and machine learning.

1. Understanding Recommendation Systems

Recommendation systems are designed to predict what a user might be interested in based on their past behavior or preferences. They are commonly used in various applications, from suggesting products in e-commerce to recommending movies on streaming platforms. There are several types of recommendation systems:

  • Collaborative Filtering: Based on user-item interactions. It can be user-based or item-based.

  • Content-Based Filtering: Based on the attributes of the items and user preferences.

  • Hybrid Methods: Combine collaborative and content-based filtering.

For this guide, we will focus on Collaborative Filtering, particularly Matrix Factorization, which is a popular approach for movie recommendations.

2. Preparing the Data

To build a recommendation system, you need a dataset of user interactions with movies. The MovieLens dataset is a widely used dataset for this purpose. It contains ratings for movies given by users, and it's available for free.

Here's how to get started:

  1. Download the Dataset:

    • You can download the MovieLens dataset from here.
  2. Load the Dataset:

     import pandas as pd
    
     # Load movie ratings data
     ratings = pd.read_csv('ratings.csv')
     movies = pd.read_csv('movies.csv')
    
     print(ratings.head())
     print(movies.head())
    

3. Building the Recommendation Model

We will use Matrix Factorization through Singular Value Decomposition (SVD) to build our recommendation system. The surprise library is a popular choice for building and evaluating recommendation systems.

  1. Install Required Libraries:

     pip install numpy pandas scikit-learn surprise
    
  2. Create the Recommendation Model:

     from surprise import Dataset, Reader, SVD
     from surprise.model_selection import train_test_split
     from surprise import accuracy
    
     # Load the dataset
     reader = Reader(line_format='user item rating timestamp', sep=',',skip_lines=1)
     data = Dataset.load_from_file('ratings.csv', reader=reader)
    
     # Split data into training and testing sets
     trainset, testset = train_test_split(data, test_size=0.2)
    
     # Build the model
     model = SVD()
     model.fit(trainset)
    
     # Make predictions
     predictions = model.test(testset)
     accuracy.rmse(predictions)
    
  3. Generating Recommendations:

     def get_top_n(predictions, n=10):
         top_n = {}
         for uid, iid, true_r, est, _ in predictions:
             if not top_n.get(uid):
                 top_n[uid] = []
             top_n[uid].append((iid, est))
         for uid, user_ratings in top_n.items():
             user_ratings.sort(key=lambda x: x[1], reverse=True)
             top_n[uid] = user_ratings[:n]
         return top_n
    
     top_n_recommendations = get_top_n(predictions, n=10)
    
  4. Personalized Recommendations for a Specific User:

     def get_movie_title(movie_id):
         return movies[movies['movieId'] == movie_id]['title'].values[0]
    
     user_id = '1'  # Example user ID
     recommendations = top_n_recommendations.get(user_id, [])
     for movie_id, rating in recommendations:
         print(f"Movie: {get_movie_title(int(movie_id))}, Predicted Rating: {rating}")
    

Final Code:

import pandas as pd
from surprise import accuracy
from surprise.model_selection import train_test_split
from surprise import Dataset, Reader, SVD

# Load movie ratings data
ratings = pd.read_csv('movie-recommendations/ratings.csv')
movies = pd.read_csv('movie-recommendations/movies.csv')

# Verify the data format
print(ratings.head())

# Ensure there are no header rows in the CSV file
# If there are headers, you should use the `skip_header=True` argument or preprocess your data

# Load the dataset
reader = Reader(line_format='user item rating timestamp', sep=',',
                skip_lines=1)  # skip_lines=1 if the file includes headers
data = Dataset.load_from_file(
    'movie-recommendations/ratings.csv', reader=reader)

# Print data to verify correct loading
print(data)

# Split data into training and testing sets
trainset, testset = train_test_split(data, test_size=0.2)

# Build the model
model = SVD()
model.fit(trainset)

# Make predictions
predictions = model.test(testset)
print(f"RMSE: {accuracy.rmse(predictions)}")

# Function to get top N recommendations


def get_top_n(predictions, n=10):
    top_n = {}
    for uid, iid, true_r, est, _ in predictions:
        if not top_n.get(uid):
            top_n[uid] = []
        top_n[uid].append((iid, est))
    for uid, user_ratings in top_n.items():
        user_ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[uid] = user_ratings[:n]
    return top_n


top_n_recommendations = get_top_n(predictions, n=10)

# Function to get movie title


def get_movie_title(movie_id):
    return movies[movies['movieId'] == movie_id]['title'].values[0]


# Get recommendations for a specific user
user_id = '1'  # Example user ID
recommendations = top_n_recommendations.get(user_id, [])
for movie_id, rating in recommendations:
    print(
        f"Movie: {get_movie_title(int(movie_id))}, Predicted Rating: {rating}")

4. Evaluating and Improving the Model

Evaluating the performance of your recommendation system is crucial. You can use metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) to measure accuracy.

Evaluation Metrics:

  • Root Mean Squared Error (RMSE): Measures the average magnitude of errors. Lower RMSE indicates better model performance.

  • Mean Absolute Error (MAE): Measures the average magnitude of errors in predictions. Similar to RMSE but less sensitive to large errors.

Improvement Techniques:

  • Fine-Tuning Hyperparameters: Use techniques like Grid Search to find the best parameters for your model.

  • Incorporating More Features: Include movie metadata (genres, directors, etc.) for a more comprehensive recommendation system.

  • Exploring Advanced Models: Consider using more sophisticated models like Neural Collaborative Filtering (NCF) or Hybrid Models.

Conclusion
Building a personalized movie recommendation system using Python and machine learning involves several key steps: data preparation, model building, and evaluation. By leveraging collaborative filtering techniques and tools like the surprise library, you can create a system that helps users find movies tailored to their tastes. Experiment with different models and techniques to continuously improve the recommendations and provide a better user experience.

Feel free to explore more advanced techniques and datasets as you refine your recommendation system and make it even more personalized!

For any software development queries, click here.

11
Subscribe to my newsletter

Read articles from ByteScrum Technologies directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

ByteScrum Technologies
ByteScrum Technologies

Our company comprises seasoned professionals, each an expert in their field. Customer satisfaction is our top priority, exceeding clients' needs. We ensure competitive pricing and quality in web and mobile development without compromise.