🌌 "The Last Library on Earth" — A Sci-Fi Story About Understanding Machines

📖 Prologue

In 2092, Earth has forgotten how machines once thought. All knowledge is stored inside one last living AI system hidden within the "Last Library on Earth."

One day, a girl named Luna enters.

"I want to learn how you think," she whispers.

"Then learn my language," the AI replies. "Start with the basics: tokens, vectors, and attention."

🔍 Table of Contents

Tokenization
Vocab Size
Embeddings
Positional Encoding
Vectors
Semantic Meaning
Transformer
Encoder
Decoder
Self-Attention
Multi-Head Attention
Softmax
Temperature
Top-P (Nucleus Sampling)
Knowledge Cutoff

🔢 AI Concepts with One-Liner Definitions + Code

1. Tokenization

Splits text into small units like words or subwords.

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("gpt2")
print(tok.tokenize("GenAI is powerful"))

2. Vocab Size

Total number of tokens a model understands.

print(len(tok))

3. Embeddings

Converts tokens into number-vectors.

from transformers import GPT2Model
model = GPT2Model.from_pretrained("gpt2")
print(model.transformer.wte.weight.shape)

4. Positional Encoding

Adds position info to token embeddings.

import torch
position = torch.arange(10).unsqueeze(1)
print(position)

5. Vectors

Numeric form of a token or sentence.

inputs = tok("Hello AI", return_tensors="pt")
outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

6. Semantic Meaning

Captures actual meaning, not just text.

from sklearn.metrics.pairwise import cosine_similarity
vec1 = outputs.last_hidden_state[0][0].detach().numpy()
vec2 = outputs.last_hidden_state[0][1].detach().numpy()
print(cosine_similarity([vec1], [vec2]))

7. Transformer

The full model architecture using attention.

from transformers import pipeline
gen = pipeline("text-generation")
print(gen("AI is", max_length=10))

8. Encoder

Reads and converts input into context.

9. Decoder

Generates output from encoded input.

10. Self-Attention

Allows each token to attend to others.

11. Multi-Head Attention

Enables looking at input from different angles.

12. Softmax

Converts raw scores into probabilities.

import torch.nn.functional as F
print(F.softmax(torch.tensor([2.0, 1.0, 0.1]), dim=0))

13. Temperature

Controls randomness in AI's output.

output = model.generate(inputs["input_ids"], temperature=1.2)

14. Top-P Sampling

Picks from top probable tokens until threshold is met.
It makes AI responses more creative, avoiding overly repetitive or robotic text.

output = model.generate(inputs["input_ids"], do_sample=True, top_p=0.9)

15. Knowledge Cutoff

Latest date until which AI was trained.

E.g. "GPT-4 has a cutoff of June 2024."

✅ Learnings & Takeaways

Machines don't "read" like us; they tokenize, embed, and attend.
Understanding these concepts helps build better prompts and apps.
This is just the beginning.

# 🧠 Decoding AI Jargons with CHAI(Daily Learning of GenAI — Part 1)📖

Table of contents

🌌 "The Last Library on Earth" — A Sci-Fi Story About Understanding Machines

📖 Prologue

🔍 Table of Contents

🔢 AI Concepts with One-Liner Definitions + Code

1. Tokenization

2. Vocab Size

3. Embeddings

4. Positional Encoding

5. Vectors

6. Semantic Meaning

7. Transformer

8. Encoder

9. Decoder

10. Self-Attention

11. Multi-Head Attention

12. Softmax

13. Temperature

14. Top-P Sampling

15. Knowledge Cutoff

✅ Learnings & Takeaways

⏰Follow to continue your journey into the machine mind.

Subscribe to my newsletter

Mantasha Beg

Mantasha Beg