Introduction

Imagine you're at Hogwarts, learning to cast new spells. Just like casting spells creates magic, Generative AI creates new content like texts and images using artificial intelligence. It doesn’t just find existing information like a spellbook; instead, it conjures entirely new material.

What is Generative AI?

Generative AI is technology that creates something new. It doesn’t just search like Google; it actually makes new content, such as texts, images, or even simple melodies, through artificial intelligence. You can think of it as conjuring a Patronus or sketching a quick portrait of Dumbledore—all on a computer!

Who Builds vs Who Uses Generative AI?

Generative AI models (like GPT) are crafted by AI researchers who rely on advanced mathematics, statistics, and machine learning techniques. However, application developers and creators—such as web or software developers—don’t need to master those deep technical details. They simply use the ready-made AI “spells” (models and APIs) to build new, exciting products. For example, the team that invented PostgreSQL built a powerful database engine, and then countless developers used it to create new applications on top of it.

How Does Generative AI Work?

Let’s understand this through GPT, or Generative Pretrained Transformer:

Generative: It makes new things, similar to a wizard creating a new potion.
Pretrained: It learns from lots of examples beforehand, like studying spells from textbooks.
Transformer: It changes or transforms input into output, like turning words into magic.

For instance, if your input is "My name is," GPT predicts and generates the next word based on what it learned before, like "My name is Harry," if that's common in its training data.

Inside the Transformer Spellbook

Transformers are like magical translators. The core magic here is a concept called "attention," introduced in a famous paper titled "Attention is All You Need," created by Google researchers. Attention helps AI decide what's important when creating outputs, just like choosing the right ingredients for a potion.

Tokens and Tokenization

In AI, we break words into smaller pieces called "tokens." For example, the word "Hogwarts" might be split into "Hog," "wart," and "s." Each token is then converted into numbers because computers only understand numbers. This process is known as tokenization.

Example using Tokenization:

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4")
text = "My name is Harry Potter"
tokens = enc.encode(text)
print(tokens)

In AI, a sequence is the order of tokens you feed in (input sequence) and the tokens the model generates (output sequence). Think of chanting the first part of a spell (input sequence) and the next words it suggests (output sequence) to complete the incantation.

Vector Embeddings

After tokens become numbers, they're turned into vector embeddings, like marking each token on the Marauder's Map. Words or tokens with similar meanings stay close together, helping the AI understand context and relationships in your sequences.

Embedding Example:

from openai import OpenAI

client = OpenAI()
response = client.embeddings.create(
    input="My name is Harry Potter",
    model="text-embedding-3-small"
)
print(response.data[0].embedding)

Positional Encoding

The order of words matters greatly. Consider these two sentences:

"Harry defeats Voldemort."
"Voldemort defeats Harry."

The order changes the meaning drastically. Positional encoding helps AI recognize this by adding position-specific information to embeddings, much like marking steps clearly in a spell.

Self Attention and Multi-head Attention

Words can have different meanings depending on context. Take "charm":

"Wingardium Leviosa is a levitation charm."
"Hermione has great charm."

Self-attention helps the AI understand the meaning based on context, allowing each word to influence the understanding of others.

Multi-head attention is like observing several magical scenarios simultaneously—like noticing details of a Quidditch match, such as Harry chasing the Snitch, Gryffindor fans cheering, and the game's score all at once.

Training and Prediction

During training, AI learns by predicting the next word in a sentence and then adjusting itself when it makes a mistake (called backpropagation). This is like a student wizard practicing a spell again and again until it's perfect.

During inference (when you actually use the model), it immediately provides its best guess for the next token without further adjustment—like casting a well-rehearsed spell in a single try.

Developers using these models don’t need to understand the complex math behind backpropagation. They can simply call an API to generate text or images and build their own magical applications.

Quick Summary

Generative AI transforms input into meaningful new output through:

Words → Tokens → Numbers → Embeddings → Positional Encoding → Attention → Output

Generative AI, like magic at Hogwarts, combines multiple steps to create meaningful and exciting new content!

Generative AI: Magic Simplified for Wizards-in-Training

Subscribe to my newsletter

Manthan Singh Shekhawat

Manthan Singh Shekhawat