# ๐ง Decoding AI Jargons with CHAI(Daily Learning of GenAI โ Part 1)๐


๐ "The Last Library on Earth" โ A Sci-Fi Story About Understanding Machines
๐ Prologue
In 2092, Earth has forgotten how machines once thought. All knowledge is stored inside one last living AI system hidden within the "Last Library on Earth."
One day, a girl named Luna enters.
"I want to learn how you think," she whispers.
"Then learn my language," the AI replies. "Start with the basics: tokens, vectors, and attention."
๐ Table of Contents
Tokenization
Vocab Size
Embeddings
Positional Encoding
Vectors
Semantic Meaning
Transformer
Encoder
Decoder
Self-Attention
Multi-Head Attention
Softmax
Temperature
Top-P (Nucleus Sampling)
Knowledge Cutoff
๐ข AI Concepts with One-Liner Definitions + Code
1. Tokenization
Splits text into small units like words or subwords.
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("gpt2")
print(tok.tokenize("GenAI is powerful"))
2. Vocab Size
Total number of tokens a model understands.
print(len(tok))
3. Embeddings
Converts tokens into number-vectors.
from transformers import GPT2Model
model = GPT2Model.from_pretrained("gpt2")
print(model.transformer.wte.weight.shape)
4. Positional Encoding
Adds position info to token embeddings.
import torch
position = torch.arange(10).unsqueeze(1)
print(position)
5. Vectors
Numeric form of a token or sentence.
inputs = tok("Hello AI", return_tensors="pt")
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)
6. Semantic Meaning
Captures actual meaning, not just text.
from sklearn.metrics.pairwise import cosine_similarity
vec1 = outputs.last_hidden_state[0][0].detach().numpy()
vec2 = outputs.last_hidden_state[0][1].detach().numpy()
print(cosine_similarity([vec1], [vec2]))
7. Transformer
The full model architecture using attention.
from transformers import pipeline
gen = pipeline("text-generation")
print(gen("AI is", max_length=10))
8. Encoder
Reads and converts input into context.
9. Decoder
Generates output from encoded input.
10. Self-Attention
Allows each token to attend to others.
11. Multi-Head Attention
Enables looking at input from different angles.
12. Softmax
Converts raw scores into probabilities.
import torch.nn.functional as F
print(F.softmax(torch.tensor([2.0, 1.0, 0.1]), dim=0))
13. Temperature
Controls randomness in AI's output.
output = model.generate(inputs["input_ids"], temperature=1.2)
14. Top-P Sampling
Picks from top probable tokens until threshold is met.
It makes AI responses more creative, avoiding overly repetitive or robotic text.
output = model.generate(inputs["input_ids"], do_sample=True, top_p=0.9)
15. Knowledge Cutoff
Latest date until which AI was trained.
E.g. "GPT-4 has a cutoff of June 2024."
โ Learnings & Takeaways
Machines don't "read" like us; they tokenize, embed, and attend.
Understanding these concepts helps build better prompts and apps.
This is just the beginning.
โฐFollow to continue your journey into the machine mind.
Subscribe to my newsletter
Read articles from Mantasha Beg directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
