Here's how OpenAI Token count is computed in Tiktokenizer - Part 4

Ramu NarasingaRamu Narasinga
2 min read

In this article, we will review how OpenAI Token count is computed in Tiktokenizer. We will look at:

  1. Text preview

  2. Token IDs preview

Text preview

When you write some message in https://tiktokenizer.vercel.app/, this application gives you the token count and also gives you a preview of text and the text IDs as shown in the following image.

You want to find out how this text preview is rendered?

At line 67, you will find the following code:

<pre className="min-h-[256px] max-w-[100vw] overflow-auto whitespace-pre-wrap break-all rounded-md border bg-slate-50 p-4 shadow-sm">
  {props.data?.segments?.map(({ text }, idx) => (
    <span
      key={idx}
      onMouseEnter={() => setIndexHover(idx)}
      onMouseLeave={() => setIndexHover(null)}
      className={cn(
        "transition-all",
        (indexHover == null || indexHover === idx) &&
          COLORS[idx % COLORS.length],
        props.isFetching && "opacity-50"
      )}
    >
      {showWhitespace || indexHover === idx
        ? encodeWhitespace(text)
        : text}
    </span>
  ))}
</pre>

This code above renders the UI shown in the following image and this is text preview.

This uses segments array to colourise the text.

Token IDs preview

In this section, we will look at the text IDs preview.

At line 87, you will find the below code:

<pre
  className={
    "min-h-[256px] max-w-[100vw] overflow-auto whitespace-pre-wrap break-all rounded-md border bg-slate-50 p-4 shadow-sm"
  }
>
  {props.data && tokenCount > 0 && (
    <span
      className={cn(
        "transition-opacity",
        props.isFetching && "opacity-50"
      )}
    >
      {props.data?.segments?.map((segment, segmentIdx) => (
        <Fragment key={segmentIdx}>
          {segment.tokens.map((token) => (
            <Fragment key={token.idx}>
              <span
                onMouseEnter={() => setIndexHover(segmentIdx)}
                onMouseLeave={() => setIndexHover(null)}
                className={cn(
                  "transition-colors",
                  indexHover === segmentIdx &&
                    COLORS[segmentIdx % COLORS.length]
                )}
              >
                {token.id}
              </span>
              <span className="last-of-type:hidden">{", "}</span>
            </Fragment>
          ))}
        </Fragment>
      ))}
    </span>
  )}
</pre>

This code above renders the UI shown in the following image

About me:

Hey, my name is Ramu Narasinga. I study codebase architecture in large open-source projects.

Email: ramu.narasinga@gmail.com

Want to learn from open-source code? Solve challenges inspired by open-source projects.

References:

  1. https://github.com/dqbd/tiktokenizer/blob/master/src/sections/TokenViewer.tsx#L67

  2. https://github.com/dqbd/tiktokenizer/blob/master/src/sections/TokenViewer.tsx#L87

  3. https://tiktokenizer.vercel.app/

0
Subscribe to my newsletter

Read articles from Ramu Narasinga directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ramu Narasinga
Ramu Narasinga

I study large open-source projects and create content about their codebase architecture and best practices, sharing it through articles, videos.