How to Detect Telegram Impersonators with a bot

As a community moderator for Alephium, we’ve encountered numerous scammers infiltrating the chat with seemingly casual identities. When someone asks, “Is there someone who can help me?” or “I have an issue with the bridge,” these scammers quickly rename themselves to impersonate well-known figures, such as team members or community moderators.

They copy profile pictures, adopt identical display names, and create usernames that closely resemble the original—tricking users into believing they are legitimate. The image below illustrates this tactic, where the real username is "veganeutral," but the scammer altered it slightly to "veganeuitral" to deceive others.

Since these scammers rely on similar usernames and display names to deceive users, why not develop a bot to detect and counter them? Various algorithms can be used for this purpose, some of which are commonly employed in predictive typing. Here’s a quick list of potential methods:

Hamming distance
Levenshtein distance
Jaro–Winkler distance

Hamming distance

it measures the minimum number of substitutions required to change one string into the other, or equivalently, the minimum number of errors that could have transformed one string into the other - https://en.wikipedia.org/wiki/Hamming_distance

Hamming distance works only with strings of the same length, making it less effective for detecting impersonation, where usernames and display names may have varying lengths.

Levenshtein distance

Levenshtein distance measures how different two words or sequences are by counting the number of changes needed to transform one into the other. It considers three types of operations:

Insertion – Adding a character (e.g., changing "user" to "users").
Deletion – Removing a character (e.g., changing "admin" to "admn").
Substitution – Replacing one character with another (e.g., changing "mod" to "mad").

The Levenshtein distance has various applications in various fields such as:

Autocorrect Algorithms: Text editors and messaging applications use the Levenshtein distance in their autocorrect features such as gboard, swift keyboard, etc.

Data cleaning: It is widely used in the process of data cleaning and normalization task to reduce redundancy and identify similar records in the data mining process.

Data clustering and classification: To identify similar records and cluster them is clustering while identifying similar records and providing them with class labels is classification - https://www.geeksforgeeks.org/introduction-to-levenshtein-distance/

For what the bot needs to achieve, Levenshtein distance does not account for certain types of "errors" that scammers exploit. For example:

Character swaps: If a scammer writes "Alephuim" instead of "Alephium," the Levenshtein distance considers it a one-character substitution, but it doesn't recognize that this is a common trick used in impersonation.
Visually similar characters: Some letters look alike (e.g., "l" and "I", "0" and "O"), but Levenshtein treats them as different, missing the deception.
Extra or missing spaces, symbols, or capitalization changes: Names like "TeamAlephium" vs. "Team_Alephium" may not trigger a high Levenshtein score, even though they are meant to confuse users.

Because of these limitations, a more advanced method, like phonetic algorithms (Soundex, Metaphone) or fuzzy string matching (Jaro-Winkler), might be needed to improve detection accuracy.

Jaro–Winkler distance

Jaro-Winkler is a combination between 2 measures:

Jaro similarity: measure of similarity between two strings. The value of Jaro distance ranges from 0 to 1. where 1 means the strings are equal and 0 means no similarity between the two strings.
Jaro-Winkler: uses a prefix scale ‘p’ which gives a more accurate answer when the strings have a common prefix up to a defined maximum length l.

The interesting part about Jaro-Winkler is that it gives better results when strings share the same base, just like how scammers modify names while keeping them recognizable.

This is particularly useful for impersonation detection because scammers often make small changes at the end or middle of a name while keeping the beginning similar. For example:

"veganeutral" → "veganeuitral" (minor character swap) gives 98.30% similarity
"diomark0" → "diomark" (remove letter at the end) gives 97.50% similarity

Since Jaro-Winkler gives more weight to similarities at the start of a string, it is effective at catching these deceptive modifications. This makes it a strong choice for detecting impersonators who tweak usernames while keeping them visually close to the original.

Jaro-Winkler wins

At the end we have decided to go with Jaro-Winkler because it gives good result when it’s comes to name impersonation. Time complexity of this algorithm is O(n*m) where n and m are the length of the strings.

It prioritizes the start of the name
It handles transpositions well
It gives a similarity score (not just a raw number)

Here’s an example of the bot in action, successfully identifying a scammer attempting to impersonate a team member or community moderator:

The bot is currently running across multiple groups with over 7,000 members, actively detecting potential scammers attempting to impersonate team members and community moderators. We have decided to open source the code.

Catch impersonators on Telegram

Hamming distance

Levenshtein distance

Jaro–Winkler distance

Jaro-Winkler wins

Subscribe to my newsletter

cgi-bin

cgi-bin