How to remove Emoji expressions in .Net

2 min read
Here is an example of what I tried but it went wrong
I searched the internet for information on writing regular expressions and found a js one:
const emojiRegex = /[\u{1F300}-\u{1F5FF}\u{1F900}-\u{1F9FF}\u{1F600}-\u{1F64F}\u{1F680}-\u{1F6FF}\u{2600}-\u{26FF}\u{2700}-\u{27BF}\u{1F1E6}-\u{1F1FF}\u{1F191}-\u{1F251}\u{1F004}\u{1F0CF}\u{1F170}-\u{1F171}\u{1F17E}-\u{1F17F}\u{1F18E}\u{1F190}-\u{1F19A}]/gu;
Using deepseek to translate C#:
string emojiPattern = @"[\u1F300-\u1F5FF\u1F900-\u1F9FF\u1F600-\u1F64F\u1F680-\u1F6FF\u2600-\u26FF\u2700-\u27BF\u1F1E6-\u1F1FF\u1F191-\u1F251\u1F004\u1F0CF\u1F170-\u1F171\u1F17E-\u1F17F\u1F18E\u1F190-\u1F19A]";
Try using the following:
๐ ๐ถ๐๐๐๐๐๐๐๐๐๐ ๆฏๆไธ้ฆๆญๆๅฅฝ็ๆนๅผๅฐฑๆฏๅฐๅฎ่ฎพไธบ้น้โช ๆๅๅคช้ณๆฏ่ฐ่ตทๅพๆดๆฉ ๐ฐ๐ธ๐ธ๐น ๐๐ธ๐๐๐พ๐๐ฐ ... ๐
Then run the result:
๐ ๐ถ๐๐๐๐๐๐๐๐๐๐ ๆฏๆไธ้ฆๆญๆๅฅฝ็ๆนๅผๅฐฑๆฏๅฐๅฎ่ฎพไธบ้น้โช ๆๅๅคช้ณๆฏ่ฐ่ตทๅพๆดๆฉ ๐ฐ๐ธ๐ธ๐น ๐๐ธ๐๐๐พ๐๐ฐ ... ๐
It seems that there is no change, so I found another question on the stack and told me to use:
string emojiPattern = @"\p{Cs}";
Then the running result found that other Unicode encodings were also eliminated:
ๆฏๆไธ้ฆๆญๆๅฅฝ็ๆนๅผๅฐฑๆฏๅฐๅฎ่ฎพไธบ้น้โช ๆๅๅคช้ณๆฏ่ฐ่ตทๅพๆดๆฉ ...
The correct way to do it:
using System;
using System.Text;
public static class EmojiRemover
{
public static string RemoveEmojis(string input)
{
if (string.IsNullOrEmpty(input))
return input;
var stringBuilder = new StringBuilder();
for (int i = 0; i < input.Length;)
{
// Checks if the current character is the start of a surrogate pair
if (Char.IsSurrogate(input[i]))
{
// Get two characters of the proxy pair
if (i + 1 < input.Length && Char.IsSurrogatePair(input[i], input[i + 1]))
{
int codePoint = Char.ConvertToUtf32(input, i);
if (!IsUnicodeEmoji(codePoint))
{
stringBuilder.Append(input[i]);
stringBuilder.Append(input[i + 1]);
}
i += 2;
}
else
{
// Invalid proxy pair, skip it directly
i++;
}
}
else
{
int codePoint = Char.ConvertToUtf32(input, i);
if (!IsUnicodeEmoji(codePoint))
{
stringBuilder.Append(input[i]);
}
i++;
}
}
return stringBuilder.ToString();
}
private static bool IsUnicodeEmoji(int codePoint)
{
// Define common Emoji Unicode scopes
return (0x1F600 <= codePoint && codePoint <= 0x1F64F) ||
(0x1F300 <= codePoint && codePoint <= 0x1F5FF) ||
(0x1F680 <= codePoint && codePoint <= 0x1F6FF) ||
(0x1F1E0 <= codePoint && codePoint <= 0x1F1FF) ||
(0x1F900 <= codePoint && codePoint <= 0x1F9FF) ||
(0x1F004 <= codePoint && codePoint <= 0x1F0FF) ||
(0x2694 <= codePoint && codePoint <= 0x269F) ||
(0x2600 <= codePoint && codePoint <= 0x26FF) ||
(0x2700 <= codePoint && codePoint <= 0x27BF) ||
(0x1F700 <= codePoint && codePoint <= 0x1F77F) ||
(0x1F780 <= codePoint && codePoint <= 0x1F7FF) ||
(0x1F800 <= codePoint && codePoint <= 0x1F8FF) ||
(0x1F980 <= codePoint && codePoint <= 0x1F9E0);
}
}
The running result is perfect:
๐ถ๐๐๐๐๐๐๐๐๐๐ ๆฏๆไธ้ฆๆญๆๅฅฝ็ๆนๅผๅฐฑๆฏๅฐๅฎ่ฎพไธบ้น้โช ๆๅๅคช้ณๆฏ่ฐ่ตทๅพๆดๆฉ ๐ฐ๐ธ๐ธ๐น ๐๐ธ๐๐๐พ๐๐ฐ ...
0
Subscribe to my newsletter
Read articles from tom yang directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
