Decoding GPT working

It’s a machine with various processes happening inside. Two types of things we do with this machine- 1) Train this machine on some data again and again to get desired output 2) Use this machine by asking some query and getting some output

processes:

user input→ encoding→ tokenization→ vector embedding(for semantic meanings)→ positional embedding→ self attention(or multi Head attention)→ happens on repeat to refine

Further we have two more terms :- Linear and Softmax

Linear :- Let suppose for some input our transformer(machine) generating more than one responses, let say 4 but it has to provide only one. so all these 4 possible responses have some probability , so the one with max probability will be the output.

Softmax:- It is basically a mathematical value if it is minimum then we will get output with max probability or vice versa.

Temperature - It’s again a value, if we set it high, our GPT model is extrovert and if we set it low, our GPT model is introvert.

Let’s understand the process in more detail

We give input to out transformer machine, it’s encoded(tokenize) to numbers, basically an array of numbers. After that, it converts into the vector embeddings such that it has some semantic meaning. Now we do positional encoding there can be two different sentences with same words just the position is changed. After that we come to a process called self attention which it do repeatedly. Basically the words in the sentences interact with each other and getting the context with respect to each other and change their tokens.

See Example:-

Step 1: Input Text → The cat is sitting on the chair

Step 2: Tokenization → Token IDs: [101, 201, 302, 405, 128, 101, 509] (Note: These are dummy token IDs for illustration)

Step 3: Embedding → Embedding("The") → [0.02, -0.14, ..., 0.25] ……

Step 4: Positional Encoding → Position 0 (The) + PosEncoding(0) … , Now, the model knows "The cat" is different from "Cat the".

Step 5: Self-Attention →

Each word interacts with every other word to understand context.

For example:

"cat" looks at "sitting" and "chair" → learns it's not just any cat, it's sitting on a chair.
"the" attends to "chair" → learns which object it's specifying.

The model calculates attention scores between all word pairs and updates their representations accordingly.

Step 6: Multi-Head Attention + Layers

Self-attention is done multiple times in parallel (multi-head), so the model can focus on different relationships at once.

Few More Terms-

1)Knowledge cutoff- The knowledge cutoff is the latest date up to which an AI model has been trained on information.

2) tokenizations- Tokenization is the process of breaking input text into smaller pieces called tokens. These can be:

Words
Subwords
Characters
Punctuation marks

3)Vocab Size- The vocab size is the total number of unique tokens (words, subwords, symbols) that a model can recognize.

Some Online Tools-

https://research.google/pubs/attention-is-all-you-need/ , https://tiktokenizer.vercel.app/

#chaicode

Decoding AI Jargons with GenAI

Table of contents