Inner working of LLM : Decoding AI jargon

Table of contents

Talking to Chat GPT feels like fulfilling dreams and nightmares for someone. Some use it because they get someone to talk without the fear of being judged.

For technical folks, it feels like a nightmare because it provides the answer in just a few prompts and makes some changes in their code, and they all start to vibe on it. But some other species try to jailbreak Al. They are the ones who want to make the world a science fiction movie where Al will control the humans.

Now, just come back to the original question: How does Al work? Is there Jinni inside the Gemini or God inside Grok? And why it feels magical as it has the answer to most of the questions. But in reality, it just works like the human brain, which got accessed the large volume of Data and have years of training and fine tuning for the use. In the High level, there is a structure like the brain where neurons are just replaced by the computing units and have weights attached to them. We adjust the weights to get the output.

I know it can be confusing, so relate to the child who is in the learning phase. In which every spherical thing is the ball for him/her. But we say no then after reevaluating it starts to get different what is orange and what is a ball?

All LLM models are built on these kinds of Neural networks and respond to the user’s query by just predicting the next word.

Now, we understand how all the systems of GPT (Generative Pre-trained Transformer) work. Simply, it tries to generate (or predict) the next token (or word) by looking at the training data (or the available information).

There are several steps involved in the process of understanding the user prompt and generating the complete response. Let's discuss them.

Step 1: Tokenization -

First, we got the prompt in the natural language (or Human Readable), but our system can't understand it directly. It uses various techniques to frame the meaning of the prompt.

Which involves breaking down prompts into tokens. There are various techniques to do it. Different models follow different approaches.

Some of them-

Word level Tokenization - Just break down into words.

Sub-word level Tokenization - Breakdown the words into small chunks like 2 or 3 letters according to some similarity

Byte-Pair-Encoding (BPE) - This Tokenization is a kind of Hybrid Technique generally used by the LLM. Nowadays, they break down prompts in words, sub-words, and character levels depending on the data model is trained and complex algo and gives the Token ID.

Step 2: Embedding-

But just the Tokenization is not enough to get a complete understanding of the context of the prompt, so LLM uses vector Embedding.

It converts the token ID into vector embedding using some complex algo, which helps LLM to represent the vector in vector DB try to map the vector to get more understanding.

But vector embedding is not enough sometimes. So we need to do Positinal Encoding. In this process, a token is processed according to its position in the sentence.

Step 3: Multihead Attention-

Before understanding this, we need to understand Self Attention. Self-attention helps tokens interact with each other to manage their position for a clearer vector representation.

E.g. Sentence 1 - “The River bank.” Sentence 2 - “The Reserve bank.” Here, we have ‘bank’, one word with different meanings; we can only get the sense by the word before it.

But Multihead Attention allows us to gain more context about the situation. It is like understanding (or evaluating) things by seeing them from different aspects. It is the same as we judge the situation by only seeing it. Like we sometimes do in day-to-day life.

Step 4: Linear -

Linear helps to predict the next word(or Token). It gives the next token with the probability.

E.g. User- “ Who are you?” Then, linear genrates the matrics of token with the possibility. like- I: 85%, u: 12% t:2% etc.

So, it provides tokens that can fit in according to it.

Step 5: Softmax -

It picks the next word from all the possibilities given by Linear. There is a parameter named Temperature, which plays a role in how Softmax picks the next token. If the Temp is low, then it simply picks the highest probability. But if the Temp is high, then it tries to be more creative by picking up the other tokens.

And that’s all the process that happens in the loop in the LLMs. This process gives all the answers.

Some other jargon of AI:

  • Encoder: Process to convert the Natural Language to Token to be understood by the LLM.

  • Decoder: Process to convert the Token to Natural Language to respond.

  • Vocab Size: It represents the number of unique words present in their dictionary; the larger it is, the better it is.

  • Knowledge cutoff: It has a knowledge base up to that date. It is trained on the data till that date.


That’s it. Please feel free to give feedback on the article.

Feel free to connect with me.

10
Subscribe to my newsletter

Read articles from Aditya Chaudhary directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aditya Chaudhary
Aditya Chaudhary