You have heard about tokens when there is an LLM model used, such as input tokens or output tokens. Most people think these are referred to as words, but it has a different meaning.

Tokens are the building blocks of text that OpenAI models process. They can be as short as a single character or as long as a full word, depending on the language and context. Spaces, punctuation, and partial words all contribute to token counts. This is how the API internally segments your text before generating a response.

Suppose, let’s take an example:

“Hello World! My name is Bhavya Jain.”

Now, you can see how the sentence is broken down into different tokens.

This is a broken version in the form of tokens. Each token is represented with colors. You can see that some words are given a specific number, while other words are broken into characters and given different tokens based on their character or set of characters.

This is the mapping of each token in the form of an array.

Each model has a different set of mappings for tokens, and that’s how they charge. The number of output tokens generated from the input tokens is then counted for the billing of the model. Remember, the bigger the output is and the more functionality is the higher the amount of tokens generated.

These tokens are then sent to the vector embeddings to give the relationship and semantic meanings. If you want to know about vector embeddings, I would highly recommend that you read my blog about Vector Embeddings, where I explained it in simpler terms so that you can even explain it to your mom.

😊 Stay tuned for more talks about AI…

🤖What are Tokens?

Subscribe to my newsletter

Bhavya Jain

Bhavya Jain