One stop destination for Reccomendation System

Kavach DheerKavach Dheer
19 min read

A one stop destination for Recommendation Systems

Your one stop destination to gain all the knowledge you need from the basics of recommendation system to going in depth to Major types of recommendation system.

Lets start first with what is recommendation system and why do we need them

Every day, we are inundated with choices and options. What to wear? What movie to rent? What stock to buy? What blog post to read? The sizes of these decision domains are frequently massive: Netflix has over 17,000 movies in its selection , and Amazon has over 410,000 titles in its Kindle store alone.

We need a system which will help in suggesting us or help us in decision-making process from the massive options and choices to choose from, and this is exactly where a Recommendation system comes into play.

Recommender Systems are the set of tools and techniques to provide useful recommendations and suggestions to the users to help them in the decision-making process for choosing the right products or services and giving them a good user experience.

What a recommendation system does is that it tailors and creates a personalised and unique place for each user.

For Example

— [1]In Amazon showing programming titles to a software engineer and baby toys to a new mother.

A recommendation knows what a the user needs and suggest them those things.

Data Collection

Recommendation system needs data to work and there are 2 major ways through which data is collected

  • Explicit Feedback — Through user interface by giving them options.

Example

—When we create a new Spotify account it asks what genre of music we like, what are the artist we like, our favourite band and so on.

However, it often requires users to actively provide this information, and not all users provide feedback, and even when they do, it may not always be accurate. Consequently, the system's ability to offer precise recommendations is compromised.Additionally, collecting explicit feedback can be resource-intensive and may impact the user experience.

  • Implicit Feedback — Observing the User — [3]if a user purchases an item, that is a sign that the user likes the item, while if the user purchases and returns the item that is a sign that the user doesn’t like the item.Implicit feedback is more common and easier to collect because it doesn’t require users to explicitly rate or review items. It is often used when explicit feedback is scarce or unavailable.

Recommendation systems understand user personalities by collecting and analyzing user data to derive insights about their preferences, behaviors, and interests. They keep colllecting these data in a database.

Here is an example of they do it. -

  1. Browsing History:

    • Analyzing a user's browsing history can provide insights into their interests. For example, frequent visits to technology news sites may indicate an interest in technology, while visits to cooking websites may suggest an interest in cooking.
  2. Search Queries:

    • Examining a user's search queries can reveal their information-seeking behavior and interests. Frequent searches for "fitness tips" or "healthy recipes" might suggest an interest in a healthy lifestyle.
  3. Social Media Activity:

    • Social media platforms are rich sources of data. Analyzing the content a user shares, likes, and comments on can provide insights into their political views, hobbies, favorite brands, and social connections.
  4. Content Consumption:

    • Understanding what type of content a user consumes can provide insights into their personality. For example, if they watch a lot of comedy videos, they might have a good sense of humor.
  5. Location Data:

    • Location data from mobile devices or check-ins on social media can give information about a user's travel habits, favorite places, and daily routines.
  6. App Usage:

    • Analyzing the apps a user has installed on their devices can reveal their interests and lifestyle. For instance, a user with apps for meditation and mindfulness might be interested in self-care.
  7. E-commerce History:

    • Online shopping habits, such as the types of products purchased and frequency of purchases, can provide insights into a user's lifestyle and preferences.
  8. Temporal Patterns:

    • The times at which a user is most active online can reveal their daily routine and sleep patterns.
  9. Sentiment Analysis:

    • Analyzing the sentiment of a user's comments, reviews, and posts can provide information about their emotional disposition and attitudes.
  10. Language and Writing Style:

    • A user's writing style, grammar, and vocabulary can offer clues about their educational level and communication style.
  11. Device Usage:

    • The choice of devices (e.g., PC, smartphone, tablet) and operating systems can indicate technological proficiency and preferences.
  12. Privacy Concerns:

    • The extent to which a user is concerned about online privacy, and the measures they take to protect their data, can suggest their attitude toward personal security and privacy.

In this article we will be discussing different types of recommendation system what are the algorithms used in them and what are the merits and demerits of each.

Content-based

Collaborative recommendation

Context base Recommendation System

Knowledge Based Recommendation System

LLMs Based Recommendation System

Lets get on with the first one

Content Based Recommendation System

In a content based recommendaion system, the system suggest items to users based on their past preferences and likes.

Let’s illustrate how a content-based recommendation system works with a practical example in the context of movie recommendations:

Item Representation: Each movie in the system is represented by various features, such as genre, director, actors, and user ratings. For instance:

  • Movie A: Action, Sci-Fi, Directed by Christopher Nolan, Starring Leonardo DiCaprio, User Rating 4.5/5

  • Movie B: Drama, Romance, Directed by Greta Gerwig, Starring Saoirse Ronan, User Rating 4.0/5

  • Movie C: Action, Adventure, Directed by Steven Spielberg, Starring Harrison Ford, User Rating 4.2/5

User Profile Creation: The system builds a user profile based on the movies the user has interacted with. Let’s say the user has previously liked and watched:

  • Movie A: Action, Sci-Fi, Directed by Christopher Nolan, Starring Leonardo DiCaprio

  • The user’s profile might look like this:

  • Preferences: Action (high weight), Sci-Fi (medium weight), Christopher Nolan (high weight), Leonardo DiCaprio (medium weight)

Similarity Calculation: The system calculates the similarity between the user’s profile and other movies in the database. For example, it might find that Movie D shares many features with the user’s profile:

  • Movie D: Action, Sci-Fi, Directed by Christopher Nolan, Starring Tom Hardy, User Rating 4.3/5

  • The similarity calculation could result in a high similarity score between the user’s profile and Movie D because of the shared attributes with Movie

Ranking and Filtering: Based on similarity scores, the system ranks the movies in the database. Movie D, being highly similar to the user’s profile, is ranked at the top.

Recommendation Generation: The system presents Movie D as a recommendation to the user, as it’s the most relevant to their preferences. The user is more likely to be interested in watching Movie D because it aligns with their past interactions and preferences.

Feedback Loop: If the user watches Movie D and provides feedback (e.g., by rating it or indicating whether they liked it), this feedback is incorporated into their profile. Over time, the user’s profile evolves to reflect their changing preferences.

Machine Learning Models: Machine learning algorithms continuously optimize the recommendation process. These algorithms learn from the user’s historical interactions and feedback to improve the accuracy of future recommendations.

This example simplifies the content based process. In practice, large datasets and more complex algorithms are used to provide accurate recommendations.

Advantages

  • Personalization: Content-based recommendation systems can provide highly personalized recommendations because they focus on the specific preferences and characteristics of individual users. This personalization can enhance user satisfaction and engagement.

  • Domain-specific Recommendations: Content-based methods work well when there are domain-specific attributes or features that are important for recommendations. For example, in music recommendation, genre and artist preferences can be crucial.

Disadvantages

  • Limited Serendipity: Content-based recommendation systems may struggle to introduce users to new and unexpected items because recommendations are based on the user’s existing and past preferences.

  • For Example - A user may not follow football during the season but then become interested in the Cricket.

  • Content-based recommendation systems rely entirely on user feedback to understand their preferences, and recommendations are generated based on this input. However, this approach has limitations. Not all users provide feedback, and even when they do, it may not always be accurate. Consequently, the system's ability to offer precise recommendations is compromised.

Collaborative Recommendation system

In collaborative recommendation systems, recommendations are made by identifying users with similar preferences to the target user. For example, when the system aims to suggest content to User A, it analyzes user profiles to find individuals with comparable tastes. If it determines that User B has a closely aligned profile with User A, the system recommends items to User A based on User B's preferences and likes.

Let’s illustrate how a content-based recommendation system works with a practical example

Now, let’s say we want to recommend movies to Kavach, who has already rated Movie A, Movie C, and Movie E. We’ll use user-based collaborative filtering to make these recommendations.

Step 1: User Similarity Calculation (Cosine Similarity):

We calculate the similarity between Kavach and the other users based on their movie ratings.

  • Similarity(Kavach, Bob) = 0.28

  • Similarity(Kavach, Carol) = 0.95

  • Similarity(Kavach, Dave) = 0.27

Step 2: Neighbourhood Selection:

We select a subset of users with the highest similarity to Kavach. Let’s choose a similarity threshold of 0.5, so we select Carol (similarity 0.95).

Step 3: Rating Prediction:

Now, we predict Kavach rating for movies he hasn’t seen (Movie B and Movie D) based on the ratings of users in her neighbourhood (in this case, just Carol).

  • Predicted Rating(Kavach, Movie B) = (Similarity(Kavach, Carol) Rating(Carol, Movie B)) / Similarity Sum = (0.95 3) / 0.95 = 3

  • Predicted Rating(Kavach, Movie D) = (0.95 * 4) / 0.95 = 4

Step 4: Top-N Recommendations:

Now, we recommend the top-N movies with the highest predicted ratings to Kavach. Let’s say we recommend the top 2.

  • Top 2 Recommendations for Kavach: Movie D and Movie B

So, based on user-based collaborative filtering, Kavach should watch Movie D and Movie B because users with similar tastes (in this case, only Carol) liked these movies, and they are predicted to be good matches for Kavach preferences.

This example simplifies the collaborative filtering process. In practice, large datasets and more complex algorithms are used to provide accurate recommendations.

There are majorly two types of Algotithms used in this

  1. Memory based algorithms — These are heuristic based algorithms that try to predict target user rating for an item based on partial information available about the target user and normalized weights obtained from the dataset.Commonly used techniques in memory based algorithms are Pearson correlation coefficient and vector similarity techniques. Some advanced techniques in memory based algorithms include default voting, inverse user frequency, case amplification and imputation-boosted CF algorithms.

  2. Model based algorithms — Model-based systems use various data mining and machine learning algorithms to develop a model for predicting the user’s rating for an unrated item. Commonly used methods in this category include Bayesian networks, clustering models, regression models, latent semantic models etc.

Advantages

  1. No Need for Item Metadata: Unlike content-based recommendation systems that rely on item attributes, user-based collaborative filtering doesn’t require detailed information about items. It works solely based on user behavior, making it applicable to a wide range of item types.

  2. Serendipity: It can introduce users to new and unexpected items that they might not have discovered on their own. This serendipity can enhance the user experience by exposing users to diverse content.

Disadvantages

  1. Sparsity: Collaborative filtering can suffer from sparsity issues when dealing with large datasets, as most users have only interacted with a small fraction of available items. Techniques like matrix factorization and dimensionality reduction can help mitigate this issue.

  2. Cold Start Problem: Collaborative filtering struggles to provide recommendations for new users or items with no interaction history. Hybrid recommendation systems that combine collaborative filtering with content-based or other approaches can address this problem.

Context Recommendation System

R : User × Item × ContextRating,

A context-based recommendation system considers additional factors beyond just users and items when making recommendations. It takes into account contextual information such as time, location, temporal trends, device and platform, weather conditions, and so on to provide personalized recommendations.

Let’s illustrate how a context-based recommendation system works with a practical example

Scenario: Imagine a user named Sarah who uses a music streaming app. The context-based recommendation system takes into account the following contextual factors:

  1. Time of Day: It knows that Sarah usually listens to different types of music during different times of the day. For instance, she prefers energetic songs in the morning, calming music in the afternoon, and upbeat tracks in the evening.

  2. Location: The system is aware of Sarah's location. When she's at the gym, it recommends high-tempo workout music. When she's at home, it suggests relaxing tunes.

  3. Weather Conditions: It checks the weather in Sarah's area. On sunny days, it recommends cheerful, feel-good tracks. On rainy days, it suggests mellow and comforting songs.

  4. Listening Device: If Sarah is using her smartphone, the system recommends music that suits her on-the-go lifestyle. If she's at home using a smart speaker, it offers music for a relaxed atmosphere.

  5. Listening History: The system also considers Sarah's listening history. If she's been listening to a lot of jazz lately, it continues to suggest jazz music.

Recommendations: Based on these contextual factors, the context-based recommendation system might recommend the following:

  • In the morning, while Sarah is jogging in the park (location and time context), the system suggests her favorite upbeat songs.

  • When it's raining (weather context) and she's at home (location context), it recommends cozy acoustic tracks.

  • In the evening (time context) at a friend's party (location context), it suggests a party playlist.

  • When she's using her smart speaker (device context) and cooking dinner (activity context), it offers some background music.

The system continually adapts its recommendations based on the changing context, ensuring that Sarah receives personalized music suggestions tailored to her specific situation and preferences.

How Contextual information is utilized to provide Recommendation in Context Based System

  1. Context Pre-filtering:

    • Definition: Context pre-filtering refers to the process of filtering or pre-selecting items for recommendation based on contextual information before applying traditional recommendation algorithms.

    • Purpose: It helps reduce the item pool to a more manageable size by considering the current context, such as time, location, or user activity. This, in turn, improves the efficiency and relevance of recommendations.

  2. Context Post-filtering:

    • Definition: Context post-filtering takes place after recommendations are generated by traditional recommendation algorithms. It involves filtering and refining the recommendations based on contextual information.

    • Purpose: It ensures that the final recommendations align with the current context of the user, making them more relevant and personalized.

  3. Contextual Modeling:

    • Definition: Contextual modeling is the process of incorporating contextual information into recommendation models. It involves creating models that can adapt to various contextual factors to provide more personalized and relevant recommendations.

    • Purpose: Contextual modeling aims to make recommendations more dynamic and responsive to changing user contexts. It often involves machine learning techniques to learn how context influences user preferences.

Some of the Algorithms and Techniques used in this are -

  1. Matrix Factorization (MF):

    • Explanation: Matrix Factorization is a collaborative filtering technique used in CARS. It decomposes a user-item interaction matrix into latent factors, allowing it to learn and predict user preferences based on historical interactions. In CARS, it can be extended to incorporate contextual information, such as time, location, or user attributes, to provide more personalized recommendations.

    • **************Example :**************In a music recommendation system, Matrix Factorization can be applied to factorize the user-song interaction matrix. Contextual information such as the user's location, time of day, and mood can be incorporated as additional features. MF learns latent factors for users, songs, and context, enabling it to recommend music that matches a user's preferences in specific contexts, like suggesting upbeat songs for a user's morning commute.

  2. Tensor Factorization (TF):

    • Explanation: Tensor Factorization is an extension of matrix factorization that deals with multi-dimensional data (tensors). In the context of CARS, it can model complex relationships between users, items, and multiple contextual dimensions simultaneously, making it suitable for scenarios where context plays a significant role in recommendations.

    • Example : Consider a video streaming service. TF can be used to factorize a three-dimensional tensor representing user-movie-time interactions. This allows the system to capture how users' preferences for movies change over time. TF can recommend movies based on a user's historical viewing habits, time of day, and even their device type (e.g., TV, mobile) to offer context-aware movie suggestions.

  3. Latent Dirichlet Allocation (LDA):

    • Explanation: LDA is a probabilistic model used for topic modeling and document analysis. In CARS, it can be employed to extract latent contextual factors or topics from user and item data, helping to understand user preferences in different contexts.

    • Example : In a news recommendation system, LDA can analyze news articles and classify them into topics (e.g., politics, sports, technology). Users' historical reading preferences can be modeled with LDA, enabling the system to recommend articles based on both user interests and the current topic context. For instance, if a user has been reading technology news, LDA can recommend technology articles.

  4. Markov Models:

    • Explanation: Markov models, including Hidden Markov Models (HMMs), are used for modeling sequential data. In CARS, they are useful when there is a sequence of interactions or a temporal aspect involved in recommendations, such as recommending a series of products or services over time.

    • Example : In an e-commerce platform, a first-order Markov model can be applied to model user behavior sequences. It can predict what a user is likely to purchase next based on their recent browsing and purchase history. Contextual factors like time of year (e.g., holiday season) can influence the transition probabilities in the Markov model.

  5. Bandit Algorithms:

    • Explanation: Bandit algorithms, specifically contextual bandit algorithms, are employed to balance exploration (trying new recommendations) and exploitation (recommending known high-performing items). In CARS, they are valuable when the system needs to adapt quickly to changing user preferences or contextual factors.

    • Contextual Use: Contextual bandit algorithms can make recommendations while considering the current context, such as a user's location, device, or recent interactions, to maximize the likelihood of user engagement and satisfaction.

    • Example : Imagine a mobile app recommending restaurants. A contextual bandit algorithm can consider the user's location, preferences, and the current time of day to suggest nearby restaurants that align with the user's tastes. The algorithm can adapt its recommendations based on the changing context, like suggesting breakfast places in the morning and dinner options in the evening.

Curse of Dimensionality

The "curse of dimensionality" is a big issue for context-based recommendation algorithms because as the number of contextual factors (dimensions) increases, it becomes challenging to efficiently process and analyze the data.

To solve this issue some techniques are used such as -

  • Matrix Factorization (MF) is the most popular technique for dimensionality reduction in CARS due to its natural low-rank approximation.

  • Principal Component Analysis (PCA) is also commonly used for dimensionality reduction.

  • Other techniques such as bandit algorithms, Markov models, and Learning to Rank (LTR) are used in CARS but are still under theoretical research for dimensionality reduction.

Knowledge based recomemmendation system

A Knowledge-Based Recommendation System is like a helpful assistant that suggests things you might like. It works by taking into account what you tell it you want and what it knows about the things it can suggest.

Here's how it works:

  1. You Tell It: You start by telling the system what you're looking for. For example, you might say you want a phone with a good camera and a budget-friendly price.

  2. System Knows About Items: The system already has a lot of information about the things it can suggest, like phones. It knows details about their features, prices, and more.

  3. Matching Your Needs: It compares what you told it with what it knows about the items. So, it looks for phones that have good cameras and fit your budget.

  4. Suggestions: Based on this comparison, it suggests a list of phones that it thinks you might like. These could be ranked from the best match to the least.

  5. Explains Why: It also tells you why it made these suggestions. For instance, it might say, "I recommend this phone because it has a great camera and it's within your budget."

  6. You Give Feedback: If you try one of the suggested phones and like it or not, you can tell the system. This helps it learn more about your preference

LLMs for Recommendation System

LLMs are being used for recommendation system because LLMs offer a promising solution to these challenges. They can generate more natural and explainable recommendations, solve the cold start problem, and make cross-domain recommendations.

LLMs can improve the performance of recommender systems without relying on external retrievers.

Learning from IN-Context Learning

In this type of RecSys they do not require training data, but hence depend upon the things the user is typing or interacting witht the system. For example, when we are typing and telling things to the ChatyGpt, it is learning from that.

Basically, the things which I have interacted from the past, it learns from that.

With LLM-enhanced recommender system, it is beneficial to learn users’ preferences during the conversation. After each step of the conversation, the user’s preferences can be further drilled down to update the candidate recommendation results.

Problems Faced by all Recommendation system

Data Bias

Data bias is the systematic deviations in the data used by a recommender system that can result in unfair or inaccurate recommendations.

There are Many diferent types of bias in the recommensdation system such as

  1. Popularity bias - That the popular items keeps interacting with the user and because of this less popular things are not shown to the user, beacuse of this it is not a personalised recommendation system. It is simply following what is trending in the market

  2. Selection Bias - Selection bias refers to the deviation caused by the user’s inability to completely rate each item. Specifically, users tend to rate the products they are interested in, very satisfied and very dissatisfied products, and the vast majority of products do not belong to these three categories, which causes the MissingNotAtRandom (MNAR) problem, resulting in deviation.

To mitigate these biases, researchers mainly rely on debiasing techniques, in addition to utilizing score calibration losses and propensity scores to improve model efficiency. These approaches aim to increase user confidence and trust in the system by ensuring data fairness and reducing discriminatory outcomes.

Robustness

Robustness is the ability of a recommender system to maintain accurate and effective recommendations in the face of varying or unexpected data.

  • Some people try to attack the ReSys and want to destro thier recomendation/algorithm and so on. Hence, this is called - Adversarial Attack.

  • There are 2 Categories in this -

  1. White Box Attack - In a white-box attack the attacker has access to the model information and uses gradient-based methods such as FGSM, PGD, and C&W to search for adversarial perturbations.

  2. Black Box Attack - In a black-box attack, the attacker does not have access to the model information and uses methods such as model substitution or DeepFool to attack

In conclusion, adversarial training and model distillation are two commonly used defense models to enhance the robustness of recommender systems. These methods address the challenge of robustness by adding perurbations and transferring knowledge, respectively.

Fairness

Fairness denotes the principle that a recommender system should provide unbiased and equitable recommendations for all users, regardless of their demographic or personal characteristics.

It can be divided into 2 ways -

  1. User Based Fairness - users should not be discriminated against by the recommendation system because of their own sensitive attributes.

  2. Project based fairness - each item should have an equal chance to be recommended.

Various methods have been proposed for item-based fairness in RS, including causal inference, adversarial training, meta-learning, and reinforcement learning. Each method has its strengths and weaknesses, and the choice of method depends on the specific requirements of the RS system.

Synonymy problem

This problem arises when similar or related items have different entries or names, or when the same item is represented by two or more names in the system [78]. For exam- ple, babywear and baby cloth. Many recommender systems fail to distinguish these differences, hence reducing their recommendation accuracy.

How are Recommendation Systems Evaluated

It is evaluated in 2 ways

  1. Offline Evaluation - This type of evaluation is performed when the dataset is collected prior to design of the system. Some of the commonly used offline evaluation metric are -
  • Root Mean Squared Error

  • Mean Absolute error

  • Precision

  • Recall

  • F1 Score

  1. Online Evaluation - This evaluation is used when the experiment is conducted in real time. It evaluates the real time feedback of the users. Some of the common methods to evaluate are
  • Click Through rate

  • Bounce Rate

0
Subscribe to my newsletter

Read articles from Kavach Dheer directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kavach Dheer
Kavach Dheer