#chaicode #GenAi #cohort

What is covered?

What is AI, GPT, Knowledge Cutoff, Attention is all you need white paper by google, Transformer working, Vector embeddings, Positional Encoding, Self Attention mechanism, multi-head attention, Feed Forward,

What is AI?

AI = Data + Algorithm

What is GPT (Generative Pretrained Transformer)?

Predicts the next token from a set of data that is already available to it. It is a transformer.

Problem: Real-world data ke uppar kaam nhi krta hai, kyuki knowledge cutoff hota hai.

Knowledge Cutoff

~~Does not know anything beyond pre-trained data date.~~

How does the transformer model work?

It is beautifully stated in Attention is all you need, a white paper by Google.

A transformer can be broken into several phases: a. Input Embedding b. Positional Encoding c. Self-attention Mechanism d. Multi-head attention e. Feed Forward f. Output Embedding

Encoder

It is used for taking input.

Let’s dive deep into each phase:

💡Input Embedding:

It is the text that we input into any transformer, say chatGpt

As the input is taken, the input query is converted into tokens.

Tokenization

This process splits the words and assigns them a mathematical number taken from the vocabulary.

Note: Every model has their tokenization system.

Vocabulary

It is a kind of dictionary for each model, where each token (word or subword) is assigned a unique mathematical number.

No. of unique tokens = vocabulary size

we can also tokenize and check the vocabulary using code:

import tiktoken
encoder = tiktoken.encoding_for_model('gpt-4o')
print("Vocab Size", encoder.n_vocab)
text = "The cat sat on the mat"
tokens = encoder.encode(text)
print(text,tokens)

Output:

Vector Embedding:

It gives semantic meaning to the words. This takes place in a 3D plane.

visualisation

3D space visualisation:

What is semantic meaning?

Meaning of a word in a particular context.

For example, Bank - the side of a river or Bank - it is a financial institution.

💡Positional Encoding

This tells us the position of the tokens.

For example, the cat sat on the mat & the mat sat on the cat

Tokens will be the same for both the sentences, and vector embedding will also be the same.

💡 Self-Attention Mechanism

tokens talk to themselves and update.

Tokens talk to each other to adjust their embeddings.

Here, tokens can talk to each other and let the token Bank adjust its meaning as per the requirement.

Issue: It always has one head.

💡Multi-head attention

Focusing on the different aspects of tokens

💡Feed Forward

is a neural network that provides the output.

Note: The interaction cycle between multi-head and Feed Forward is repeated so many no. of times to get rich contextual result.

Decoder:

It provides us our soul, i.e Output.

Some more buzzing words

Inference:

Inference is the process by which a trained model makes predictions or generates outputs based on new, unseen data.

Temperature:

creativity allowed in the response. The higher the temperature, the higher the creativity.

Synthesized Data:

The data that are generated by models like ChatGPT.

Note: Models that are trained on synthesized data are less intelligent.

SoftMax:

It is the game changer- The decision maker.

It is a mathematical function that turns raw scores into probabilities.

Let's say the model sees:
"The weather is very"

Token	Score	After Softmax
cold	4.2	0.65 (65%)
hot	3.1	0.20 (20%)
rainy	2.0	0.10 (10%)
sweet	1.2	0.05 (5%)

Here, cold wins as it has the highest probability.

Credits

I would like to thank Hitesh Choudhary sir, Piyush Garg sir, for the amazing cohort and lastly, if anyone of you reading this & wants to join me in learning the same, use my code KMRITYUN21567 to get 10% off on all the courses on the chaicode.com

Thank you so much ❣️.

Decoding AI jargons with chai #chaicode

Table of contents