How to remove Emoji expressions in .Net

tom yangtom yang
2 min read

Here is an example of what I tried but it went wrong

I searched the internet for information on writing regular expressions and found a js one:

const emojiRegex = /[\u{1F300}-\u{1F5FF}\u{1F900}-\u{1F9FF}\u{1F600}-\u{1F64F}\u{1F680}-\u{1F6FF}\u{2600}-\u{26FF}\u{2700}-\u{27BF}\u{1F1E6}-\u{1F1FF}\u{1F191}-\u{1F251}\u{1F004}\u{1F0CF}\u{1F170}-\u{1F171}\u{1F17E}-\u{1F17F}\u{1F18E}\u{1F190}-\u{1F19A}]/gu;

Using deepseek to translate C#:

string emojiPattern = @"[\u1F300-\u1F5FF\u1F900-\u1F9FF\u1F600-\u1F64F\u1F680-\u1F6FF\u2600-\u26FF\u2700-\u27BF\u1F1E6-\u1F1FF\u1F191-\u1F251\u1F004\u1F0CF\u1F170-\u1F171\u1F17E-\u1F17F\u1F18E\u1F190-\u1F19A]";

Try using the following:

๐Ÿš€ ๐™ถ๐š˜๐š˜๐š๐š–๐š˜๐š›๐š—๐š’๐š—๐š ๆฏๆŽ‰ไธ€้ฆ–ๆญŒๆœ€ๅฅฝ็š„ๆ–นๅผๅฐฑๆ˜ฏๅฐ†ๅฎƒ่ฎพไธบ้—น้’Ÿโ“ช ๆˆ‘ๅ’Œๅคช้˜ณๆฏ”่ฐ่ตทๅพ—ๆ›ดๆ—ฉ ๐“ฐ๐“ธ๐“ธ๐’น ๐“‚๐“ธ๐“‡๐“ƒ๐’พ๐“ƒ๐“ฐ ... ๐ŸŒŸ

Then run the result:

๐Ÿš€ ๐™ถ๐š˜๐š˜๐š๐š–๐š˜๐š›๐š—๐š’๐š—๐š ๆฏๆŽ‰ไธ€้ฆ–ๆญŒๆœ€ๅฅฝ็š„ๆ–นๅผๅฐฑๆ˜ฏๅฐ†ๅฎƒ่ฎพไธบ้—น้’Ÿโ“ช ๆˆ‘ๅ’Œๅคช้˜ณๆฏ”่ฐ่ตทๅพ—ๆ›ดๆ—ฉ ๐“ฐ๐“ธ๐“ธ๐’น ๐“‚๐“ธ๐“‡๐“ƒ๐’พ๐“ƒ๐“ฐ ... ๐ŸŒŸ

It seems that there is no change, so I found another question on the stack and told me to use:

string emojiPattern = @"\p{Cs}";

Then the running result found that other Unicode encodings were also eliminated:

  ๆฏๆŽ‰ไธ€้ฆ–ๆญŒๆœ€ๅฅฝ็š„ๆ–นๅผๅฐฑๆ˜ฏๅฐ†ๅฎƒ่ฎพไธบ้—น้’Ÿโ“ช ๆˆ‘ๅ’Œๅคช้˜ณๆฏ”่ฐ่ตทๅพ—ๆ›ดๆ—ฉ   ...

The correct way to do it:

using System;
using System.Text;

public static class EmojiRemover
{

    public static string RemoveEmojis(string input)
    {
        if (string.IsNullOrEmpty(input))
            return input;

        var stringBuilder = new StringBuilder();

        for (int i = 0; i < input.Length;)
        {
            // Checks if the current character is the start of a surrogate pair
            if (Char.IsSurrogate(input[i]))
            {
                // Get two characters of the proxy pair
                if (i + 1 < input.Length && Char.IsSurrogatePair(input[i], input[i + 1]))
                {
                    int codePoint = Char.ConvertToUtf32(input, i);
                    if (!IsUnicodeEmoji(codePoint))
                    {
                        stringBuilder.Append(input[i]);
                        stringBuilder.Append(input[i + 1]);
                    }
                    i += 2;
                }
                else
                {
                    // Invalid proxy pair, skip it directly
                    i++;
                }
            }
            else
            {
                int codePoint = Char.ConvertToUtf32(input, i);
                if (!IsUnicodeEmoji(codePoint))
                {
                    stringBuilder.Append(input[i]);
                }
                i++;
            }
        }

        return stringBuilder.ToString();
    }

    private static bool IsUnicodeEmoji(int codePoint)
    {
        // Define common Emoji Unicode scopes
        return (0x1F600 <= codePoint && codePoint <= 0x1F64F) ||
               (0x1F300 <= codePoint && codePoint <= 0x1F5FF) ||
               (0x1F680 <= codePoint && codePoint <= 0x1F6FF) ||
               (0x1F1E0 <= codePoint && codePoint <= 0x1F1FF) ||
               (0x1F900 <= codePoint && codePoint <= 0x1F9FF) ||
               (0x1F004 <= codePoint && codePoint <= 0x1F0FF) ||
               (0x2694 <= codePoint && codePoint <= 0x269F) ||
               (0x2600 <= codePoint && codePoint <= 0x26FF) ||
               (0x2700 <= codePoint && codePoint <= 0x27BF) ||
               (0x1F700 <= codePoint && codePoint <= 0x1F77F) ||
               (0x1F780 <= codePoint && codePoint <= 0x1F7FF) ||
               (0x1F800 <= codePoint && codePoint <= 0x1F8FF) ||
               (0x1F980 <= codePoint && codePoint <= 0x1F9E0);
    }
}

The running result is perfect:

 ๐™ถ๐š˜๐š˜๐š๐š–๐š˜๐š›๐š—๐š’๐š—๐š ๆฏๆŽ‰ไธ€้ฆ–ๆญŒๆœ€ๅฅฝ็š„ๆ–นๅผๅฐฑๆ˜ฏๅฐ†ๅฎƒ่ฎพไธบ้—น้’Ÿโ“ช ๆˆ‘ๅ’Œๅคช้˜ณๆฏ”่ฐ่ตทๅพ—ๆ›ดๆ—ฉ ๐“ฐ๐“ธ๐“ธ๐’น ๐“‚๐“ธ๐“‡๐“ƒ๐’พ๐“ƒ๐“ฐ ...
0
Subscribe to my newsletter

Read articles from tom yang directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

tom yang
tom yang