How To Count Strings With Emojis In JavaScript

Jeff JakinovichJeff Jakinovich
3 min read

I love emojis. Who doesn’t?

I was polishing off a highly intellectual X post a few days ago when I realized something.

Emojis aren’t counted the same as regular characters

When typing out emojis in the new post section of X, you can see how regular characters count less than emojis.

After a quick search, I found out it has something to do with how they are encoded in the Unicode system.

Essentially, emojis are made of multiple code points, and length only counts code points, not characters.

Regardless of why it happens, I thought about all the text counters I’ve created and how many exist in SaaS land.

Emojis are not getting their fair shake 😢.

Simply taking the length of the string isn’t an accurate count. Take, for example, something like this:

import { useState } from "react";

export default function App() {
  const [text, setText] = useState("");

  function countString() {
    return text.length;
  }

  function handleChange(e) {
    setText(e.target.value);
  }

  return (
    <div className="App">
      <h1>Make the emojis count 👍</h1>
      <textarea value={text} onChange={handleChange} />
      <small>Characters: {countString()}</small>
    </div>
  );
}

This is a simple React component that tracks the characters typed into a text field. It is the most common implementation of this feature.

But the output gives us the same problem as my X post:

Modern web development makes it easy to count characters accurately

You can use a built-in object called Intl.Segmenter.

There is a much broader use case for the object, but it essentially breaks down strings into more meaningful items like words and sentences based on a locale you provide. It offers more granularity than simply using code points.

To fix our example above, all we have to do is update our countString function like this:

import { useState } from "react";

export default function App() {
  const [text, setText] = useState("");

  function countString() {
    return Array.from(new Intl.Segmenter().segment(text)).length;
  }

  function handleChange(e) {
    setText(e.target.value);
  }

  return (
    <div className="App">
      <h1>Make the emojis count 👍</h1>
      <textarea value={text} onChange={handleChange} />
      <small>Characters: {countString()}</small>
    </div>
  );
}

We create a new instance of the Intl.Segmenter object and pass our text to it. We put that output into an array and then finally take the length, which will be far more accurate than simply taking the length of the original string.

Here is the result:

So why doesn’t X count an emoji correctly?

Short answer: I have no idea.

I’ve been programming far too long to delude myself into thinking there is a simple answer.

But Intl.Segmenter has good browser support, and any performance or memory constraints would be negligible.

My best guess is that the codebase is so large and so old that it isn’t worth the side effects of a refactor.

I’d be happy to learn more if anyone has better insight into this.

I hope this helps 😄.

Happy coding 🤙.

1
Subscribe to my newsletter

Read articles from Jeff Jakinovich directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jeff Jakinovich
Jeff Jakinovich