Web Demo: animerecbert.online

I wanted to build an anime recommendation system and came across a GitHub repo that used BERT trained on the MovieLens dataset. The impressive results it achieved inspired me to adapt the approach for anime recommendations. What started as a side project turned into a full-scale experiment with millions of anime ratings and one powerful language model: BERT.

To build a high-quality dataset, I scraped over 4 million user anime lists from:

AniList
Kitsu
MyAnimeList

After filtering for users with at least 10 ratings, I was left with around 1.5 million users. For this experiment, I used a subset of 600,000 users for faster training and prototyping.

After testing various approaches like VAE and Matrix Factorization, BERT consistently delivered the best results. So I built AnimeRecBERT — a BERT-based system trained on this dataset with 54 million ratings.

The results? It actually gets my taste right.

The Moment of Truth: Testing My Own System

To evaluate the model qualitatively, I gave it a list of my 9 favorite anime:

Classroom of the Elite
Pseudo Harem
Don’t Toy With Me, Miss Nagatoro
86 (Eighty-Six)
Mushoku Tensei
Made in Abyss
Shangri-La Frontier
The Case Study of Vanitas
Hell’s Paradise

NOTE: The position of favorites does not affect inference results, as the model uses only the presence of items (not sequence).

Top Recommendation: Call of the Night, Summertime Rendering, Heavenly Delusion, and Jujutsu Kaisen.

Out of the top 15 recommendations, 6 were anime I’d already watched and loved.
The rest? Titles like Frieren, Tonikaku Kawaii, Solo Leveling, and Heavenly Delusion fit my taste almost perfectly.

🧠 BERT Model Architecture

This project builds upon BERT4Rec-VAE-Pytorch (https://github.com/jaywonchung/BERT4Rec-VAE-Pytorch), with several key modifications:

- Custom dataset scraped from AniList, Kitsu, and MyAnimeList
- GUI-based inference script for quick testing
- Positional encoding removed from architecture (no timestamp info in user lists)
- Input sequence length: 128 anime tokens
- Increased dropout for better regularization

GitHub Repo

📂 You can try the model for yourself!
Setup & Inference instructions available on GitHub:

👉 https://github.com/MRamazan/AnimeRecBERT

Web Demo: animerecbert.online

Building AnimeRecBERT: How I Used BERT to Create a Personalized Anime Recommendation System

The Moment of Truth: Testing My Own System

🧠 BERT Model Architecture

GitHub Repo

Some Images From Website

Subscribe to my newsletter

Ramazan Turan

Ramazan Turan