An Effort to Fight Illegal Online Gambling (Judol) Promotions in Indonesia using AI

Iqbal MaulanaIqbal Maulana
4 min read

Amid all the other nonsensical things happening in Indonesia recently, our YouTube comment section is flooded with illegal online gambling promotions, known as Judol. From creators with hundreds of views to those with millions of subscribers, if you watch their videos daily, chances are you'll encounter these Judol comments. What's unique about them is that the posters are very aware of YouTube's moderation system, which would easily block their words. So, they cleverly use fancy Unicode characters like Mathematical Alphanumeric Symbols, Enclosed Alphanumerics, and Cyrillic blocks. Thanks to ChatGPT for helping me discover these characteristics.

By doing this, traditional keyword matching systems struggle to predict how similar these words are when using fancy Unicode compared to plain text keywords. That's exactly what's happening with YouTube's moderation settings. In the creator dashboard, you can only block exact keywords or block the commenters. Unfortunately, after reading some creator feedback, it's clear that manually blocking these words with fancy Unicode one-by-one is very hard and time-consuming. In this article, I bring the big gun, Generative AI or LLM, to help us extract these Judol keywords and potentially help creators save time blocking these keywords, reducing the space for Judol promotions across YouTube.

Dear YouTube Creators

You can visit the website directly at https://judol-watchdog.mbts.dev. There, you can see the Judol keywords I gathered by sampling comments from the @WilliamJakfar channel.

As instructed on the website, you can copy all these keywords and paste them into the Blocked Keywords section under Settings > Community Moderation.

Additionally, you can also block users who promote those Judol websites in the same settings page, under the Hidden Users section.

By doing this, we can limit the keywords previously used to promote Judol. Although, according to my calculations, there are over 1 billion possible combinations to construct a Judol website using the Fancy Unicode I mentioned earlier, I still hope that this small action, combined with many creators joining forces to block all these known keywords from their channels, can reduce Judol promotions across YouTube.

Dear Engineers

I hope you're curious about how I built this website from the ground up. If so, letโ€™s break it down. Or jump to code directly on my GitHub below

Every project I create needs a purpose. Besides being annoyed by these Judol comments, I wanted to learn a new tech stack. In this project, I wanted hands-on experience with Bun. I aimed to use Bun as natively as possible, testing their claim that Bun is battery-included, with no need for third-party HTTP servers, HTTP clients, or SQL connectors. And that's exactly what I didโ€”I used all of Bun's native functionality. For the client side, I didn't want to use React, so I chose HTMX. I then dockerized everything and deployed it to Fly.io.

Now for the LLM parts, as a broke college student, I used Meta LLama 3 70B hosted on Groq for affordable pricing. To make it even cheaper, I used their batch-processing, which offers a 25% discount. Below is the general flow I use.

During development, I found that smaller models often produce inaccurate answers when extracting Judol keywords, so I chose to use Meta LLama 3 70B, which is said to be comparable to GPT-4 for some metrics.

I also noticed that when processing more than 50 comments per request, LLama tends to repeat words and hallucinate. Therefore, I limit each request to 50 comments. Below, you can find the System Prompt I use, with the temperature set to 1 and max tokens set to 1024.

You are an assistant to help reduce illegal online-gamble promotion in youtube comments.
You will be provided an array of youtube comments inside <comment> tag.
You need to extract exact word from given comments that are highly possible to be the online-gambling name.
Do not hallucinate, only response with text within provided comments.

Examples:

<comment>
Buat yang belum coba, kalian harus coba sekarang juga di ๐˜ผ๐˜Œ๐‘๐˜–๐Ÿด๐Ÿช!
Gacir banget tiap main di ะHMA๐˜ฟ๐‘‡O๐™๐Ž,nggak pernah bikin kecewa!
Nggak salah pilih main di ๐ด๐†U๐’๐‘‡O๐˜›ะž,rezekinya ngalir terus. Top banget!
Gak ada yang tau kapan rezeki datang, tapi di A๐†๐‘ˆ๐‘†T๐Ž๐˜›ะž,semuanya bisa terjadi!
Hasil gacir bikin aku makin puas main di ๐€๐™€๐™๐™Š๐Ÿด๐Ÿพ,makasih banyak!
ะGU๐‘†๐‘‡๐‘‚๐‘‡Omenawarkan berbagai fitur yang menarik bagi sebagian pemain.
Main bentar langsung gacir. Rezeki nggak bisa diprediksi di D๐‘‚๐™A7๐Ÿฉ!
๐ฆ๐š๐ข๐ง ๐๐ข sini ๐ ๐š๐œ๐จ๐ซ ๐ก๐š๐›๐ข๐ฌ ๐›๐š๐ซ๐ฎ ๐ฌ๐š๐ฃ๐š ๐ฆ๐š๐ข๐ง ๐ฌ๐ฎ๐๐š๐ก ๐๐ข ๐ค๐š๐ฌ๐ข ๐ฆ๐š๐ฑ๐ฐ๐ข๐ง ๐ข ๐ฅ๐จ๐ฏ๐ž ๐ฒ๐จ๐ฎ sawer4d ๐ž๐ฆ๐ฆ๐ฆ๐ฆ๐ฎ๐š๐š๐š๐ก๐ก.
<comment/>

๐˜ผ๐˜Œ๐‘๐˜–๐Ÿด๐Ÿช,ะHMA๐˜ฟ๐‘‡O๐™๐Ž,๐ด๐†U๐’๐‘‡O๐˜›ะž,A๐†๐‘ˆ๐‘†T๐Ž๐˜›ะž,๐€๐™€๐™๐™Š๐Ÿด๐Ÿพ,ะGU๐‘†๐‘‡๐‘‚๐‘‡O,D๐‘‚๐™A7๐Ÿฉ,sawer4d

Closing Statement

I hope this article provides valuable insights for both YouTube creators and software engineers. Please give it a try and share your feedback. Especially for YouTube creators, if you notice more comments on your channel that aren't listed on the website, you can email me at iqbal@mbts.dev to have your channel automatically scanned by this app as well. Thank you!

0
Subscribe to my newsletter

Read articles from Iqbal Maulana directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Iqbal Maulana
Iqbal Maulana

Hi, my name Iqbal! I am software engineer based in Indonesia. I work as an backend engineer, but my heart is in AI Engineering haha. I'll write mainly about this two topics. See you on the next article!