Hello! I am back with another story from part of my life. Recently, I have been working on a side project named Shop Naturally. It is a natural search for e-commerce. Meaning you can visit the website and write queries like, “can you suggest me budget phone under 30000?” and it will understand that sentence and generate a database query, which further will search in the database and give you 8 recommended phones.
Now, when I started working on this problem, my friends and I figured, this would be pretty easy to develop. Just scrap the data from the e-commerce website and store it in the DB. Then, when the DB query comes, sort the data by number of reviews and rating, and how many people bought this last month( few websites give this data). If you want more, you can sort by ram and storage as well.

This approach works great in theory. But in real-life scenarios, you encounter a few problems. Just because all of the sorting keys carry the same weight. The resultant answer is not great.

Let me explain you with an example. I uploaded this image to a few social media platforms for a survey.

Most people choose Phone Y for themselves because, for the cheaper price point, they are getting better storage. In this case, my last sorting key was RAM, so RAM size was the highest weightage for the algorithm. and it chose Phone X, unlike most people.

Instagram users saying they like Phone Y

Users on Instagram express their preference for Phone Y.

With 79% votes, it was clear that Phone Y was the winner. So I had to tweak my recommendation algorithm to get better product suggestions. Here is what I did to make it better,

Filter based on user criteria. Like Price, Ram, Storage condition, if the user has given.
If only the max price is given, like “budget phones in 30000”. Most users do not want to see 10k or 15k phones in recommendations for that query. So I am adding a min price.
Weightage for different keys, like bought, review, and rating, has different weightage of 4, 3, 2, and so on for other crucial keys. Based on this weight, a score is being calculated for each phone.
Then, algo sorts phones by the calculated score (highest first) and then by price (lowest first).

Results were good at this stage, however, I noticed a pattern: a whole lot of Redmi phones have massive reviews dominating my algorithm. So I decided that any brand should not have more than 50% of the total results. To handle that, I made these changes:

Group phones by brand and limit to the top 4(50% count) phones per brand.
Then combine these groups back into one list of individual phones.
Further sort the diversified list again by score and price.

This is what the final recommendation algorithm looks like, and thanks to LLM for helping me out in this.

If you have reached so far, I have a present for you. Visit Shop Naturally and enjoy the phone recommendations.

If you have any feedback or know how I can make it better, please reach out to me :)

Recommendations are hard

Subscribe to my newsletter

Khushal Sharma

Khushal Sharma