How Generative Pretrained Transformers Power Generative AI Magic

Shreyansh PanditShreyansh Pandit
12 min read

Imagine a super-smart computer program that can write stories, answer questions, or even help you code and all while sounding like a human. That’s what GPT, or Generative Pretrained Transformer, is all about. Whether you’re chatting with ChatGPT, Claude, or Gemini, you’re interacting with the power of GPT.

It’s the tech powering tools like ChatGPT, Gemini, Claude, Llama, etc, and it’s a big deal in the world of artificial intelligence (AI). Don’t worry if that sounds like another tech jargon as I’m here to explain it in simple language, with a dash of fun.

What’s GPT?

You might think GPT stands for “Get Paid Today“ or maybe “Greatest Programming Tech”. But officially, it means Generative Pre-trained Transformer. Sounds fancy, but break it down and it’s just a super-smart system that learns patterns from massive amounts of data and then generates stuff like words, code, or answers based on what it learned.

But here’s the twist: GPT isn’t just an OpenAI’s ChatGPT thing, but it’s a technology powering many AI systems, like Anthropic’s Claude and Google’s Gemini. Think of it as the secret sauce behind all your favourite AI Chatbots.

Let’s break down the name:

  • Generative: It creates stuff like text, code, even poetry!

  • Pre-trained: It’s been fed a massive buffet of data (think billions of words) to learn patterns.

  • Transformer: A fancy neural network that transforms input (your prompt) into output (a witty response).

💡
Fun Fact: Google introduced transformers in their 2017 paper Attention Is All You Need, using them for Google Translate. OpenAI took it to the next level, turning transformers into the brain of large language models (LLMs).

So... what if I told you that the same kind of tech used to translate “hello” into “hola” is now writing essays, generating code, and maybe even helping your boss write your performance review?


Why Is GPT a Big Deal?

GPT is a game-changer because it’s built on something called a transformer. Imagine a transformer as a magical machine that takes one thing and turns it into something else.

💡
A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence.

In GPT’s case, the transformer model is one that predicts the next word in a sequence, looping continuously until it crafts a coherent response that matches its pre-trained data. Basically a Generative Pretrained Transformer (GPT) essentially works by taking existing words, predicting what comes next, and continuing this loop until the output aligns with patterns it learned during training on massive datasets.

Since OpenAI released the first GPT model in 2018, it’s sparked a wave of AI innovation. Other companies, like Anthropic (Claude) and Google (Gemini), have jumped in with their own GPT’s following transformer-based model. Even Microsoft’s Copilot uses GPT tech. It’s like the AI world’s favorite recipe, everyone’s cooking with it!

So… a machine that just predicts the next word is going to take all our jobs and bring about the end of humanity? Sounds wild, right? But that’s the hype (and fear) around GPTs. While it’s true these models are getting scarily good at tasks like writing, coding, and analyzing, they’re still just very advanced next word predictors, not sentient beings plotting world domination. For now, at least. 😅


How Does GPT Work? The 4-Step Recipe

Let’s cook up some GPT magic! You give it a sentence or a question, and it predicts what comes next, word by word, until it builds a full response. Here’s how it pulls off this magic trick, and step by step turns your prompt into a masterpiece:

Step 1: Breaking Words into Tokens

When you type something like “The Quick brown fox jumps over the lazy dog” GPT chops it into smaller pieces called tokens. These could be whole words (like "The"*, **"Quick"*, *"brown"*, *"fox"*, *"jumps"*, *"over"*, *"the"*, *"lazy"*, *"dog"* ) or parts of words (like "The", "Qui", "ck", "brown", "f", "ox", ...). These tokens form a sequence.


Step 2: Tokenization

Once the text is broken into tokens, each token is converted into a number—kind of like assigning every word (or piece of a word) a special ID. This is based on the model’s vocabulary, which is a giant dictionary of tokens it knows.

💡
Fun fact: Every GPT model has its own unique tokenization algorithm and vocabulary, so the same sentence might get sliced and diced differently depending on the model you're using.

Let’s Code: Tokenizing like GPT

Now that we've explored the theory behind tokenization, let’s see it in action!
In this snippet, we’ll use OpenAI’s tiktoken library to tokenize a sentence the same way GPT models do under the hood. So, breaking it down into tokens, converting them into numbers, and then reversing the process to reconstruct the original sentence.

# Import the tiktoken library (used to tokenize text for OpenAI models)
import tiktoken

# Get the encoder/tokenizer configuration for a specific OpenAI model (in this case, gpt-4o)
encoding = tiktoken.encoding_for_model("gpt-4o")

# The input text to tokenize
text = "The Quick brown fox jumps over the lazy dog"
print("Text:", text, "\n")

# Encode the text into tokens
encoded_tokens = encoding.encode(text)
print("Encoded Tokens:", encoded_tokens, "\n")

# Decode the tokens back into the original string 
decoded_text = encoding.decode(encoded_tokens)
print("Decoded Text:", decoded_text, "\n")
Text: The quick brown fox jumps over the lazy Dog. 

Encoded Tokens: [976, 4853, 19705, 68347, 65613, 1072, 290, 29082, 18018, 13]

Decoded Text: The quick brown fox jumps over the lazy Dog.

Step-03 : Finding Vector Embeddings

Words carry meaning, and vector embeddings capture these meanings by representing words as points in a multi-dimensional space. For example, “dog” and “puppy” are positioned close together because they’re semantically similar, while “dog” and “rocket” are far apart.

This helps GPT understand the context and relationships between words, so it knows that “quick” describes the fox and not the dog.

Below is a graph that illustrates how vector embeddings might look in a simplified 3D space for easier understanding. In real applications, these vectors usually exist in multi-dimensional spaces which are often between 100 to 4,000 dimensions, Allowing models to capture subtle and complex relationships between words or concepts.

💡
Fun Fact: In lower dimensions, similar words are placed closer together, but the model may miss subtle differences between them. Higher dimensions give more space to separate words, helping the model understand small but important distinctions more accurately.

In the real world Application, vector embeddings are represented in complex, high-dimensional spaces like the example shown below:

Let’s Code: Creating Vector Embeddings like GPT

# Import the OpenAI client and dotenv to load environment variables
from openai import OpenAI
from dotenv import load_dotenv

# Load environment variables from the .env file (To load your API key)
load_dotenv()

# Initialize the OpenAI client (API key is auto-loaded from environment)
client = OpenAI()

# Input text to generate embeddings for
text = "The quick brown fox jumps over the lazy Dog."

# Request embeddings using OpenAI's 'text-embedding-3-small' model
Embedding_Response = client.embeddings.create(
    input=text, 
    model="text-embedding-3-small"
)

# Print the raw vector embedding
print("Vector Embeddings:", Embedding_Response.data[0].embedding, "\n")

# Print the dimensionality of the embedding vector
print("Dimensions:", len(Embedding_Response.data[0].embedding), "\n")
Vector Embeddings:  [-0.005960469134151936, -0.0071602207608520985, -0.0011407214915379882, -0.055188581347465515, -0.023497266694903374, 0.02807929739356041, 0.023076077923178673, 0.02215711772441864, 0.007957927882671356, -0.01665612868964672, -0.013024965301156044, 0.005363783799111843, -0.02573084644973278, 0.0404086597263813, -0.015520193614065647, 0.01068927813321352, -0.0090811001136899, -0.05615859478712082, -0.02569255791604519, -0.022680414840579033, -0.022833574563264847, 0.009802227839827538, -0.00043714360799640417, -0.004741572309285402, 0.009463999420404434, -0.008679055608808994, 0.005118090193718672, -0.0357372872531414, 0.008251484483480453, -0.008570567704737186, 0.03213803097605705, -0.028155876323580742, -0.011187047697603703, -0.0688198059797287, -0.013235559687018394, -0.03527780622243881, -0.03691151365637779, -0.014601234346628189, -0.0268540196120739, -0.00723041919991374, -0.0014454456977546215, -0.0635613203048706, 0.029202468693256378, -0.048066653311252594, 0.05855809897184372, 0.04071498289704323, 0.026981651782989502, -0.030274588614702225, 0.003545011393725872, 0.04538635537028313, -0.01855786330997944, 0.0333377830684185, -0.032265666872262955, 0.011965610086917877, -0.008959848433732986, 0.001538777374662459, 0.034997016191482544, 0.026981651782989502, -0.03563518077135086, -0.08995585888624191, 0.024824650958180428, -0.010038348846137524, 0.015111767686903477, -0.01620941236615181, -0.00811746995896101, -0.019183265045285225, -0.001778887351974845, -0.027007179334759712, 0.010408485308289528, -0.033797264099121094, 0.01733258366584778, 0.01936195231974125, -0.029764054343104362, 0.028589829802513123, 0.047326378524303436, 0.014779920689761639, 0.01660507544875145, 0.015162820927798748, 0.014703340828418732, 0.04959825053811073, -0.003761987667530775, 0.026649806648492813, -0.010159600526094437, -0.07734569907188416, -0.005823263432830572, 0.05258486419916153, -0.04485029727220535, 0.030146954581141472, 0.027977190911769867, -0.04066392779350281, 0.0736188143491745, 0.03877495601773262, -0.02298673428595066, -0.04061287268996239, 0.05021088942885399, 0.0002656365395523608, 0.015111767686903477, 0.03042774833738804, 0.046075575053691864, 0.014690577983856201, 0.06121286749839783, -0.024275828152894974, 0.00991709716618061, 0.009451236575841904, 0.0108488192781806, -0.011033887043595314, -0.011518893763422966, -0.05294223874807358, -0.013158979825675488, -0.040970247238874435, -0.05176801234483719, -0.018034566193819046, 0.015520193614065647, -0.005079800263047218, -0.011212574318051338, 0.0028637691866606474, 0.027517711743712425, -0.03326120227575302, 0.030963806435465813, -0.011952846311032772, 0.013899251818656921, 0.02453109435737133, -0.028742989525198936, -0.029125889763236046, -0.034282270818948746, 0.011755014769732952, 0.058711256831884384, -0.02853877656161785, -0.041787099093198776, -0.013350429944694042, 0.01145507674664259, -0.0001982302637770772, 0.066675566136837, -0.025501107797026634, -0.029483262449502945, -0.02848772332072258, -0.05095116049051285, -0.009546960704028606, -0.06080444157123566, 0.052380651235580444, -0.023905692622065544, -0.01853233575820923, -0.03606913238763809, 0.013235559687018394, -0.011282771825790405, 0.0026404112577438354, 0.039438650012016296, -0.03091275319457054, -0.001159866456873715, -0.019476821646094322, 0.036171238869428635, -0.007351670414209366, -0.05426962301135063, 0.017996277660131454, -0.00988518912345171, 0.020740389823913574, 0.01169119868427515, -0.030785121023654938, -0.002192099578678608, 0.007664371747523546, -0.0104914465919137, 0.02222093567252159, 0.02693059854209423, 0.006177445407956839, -0.005606287159025669, 0.019949063658714294, -0.013797145336866379, 0.019655508920550346, -0.03640098124742508, -0.046917952597141266, 0.020497886463999748, -0.004661801736801863, 0.03874943032860756, -6.50630026939325e-05, -0.012520814314484596, -0.000709959480445832, -0.02493952214717865, -0.047249797731637955, 0.002657960867509246, -0.031908292323350906, -0.01088710967451334, 0.0791325643658638, 0.014984133653342724, 0.015584009699523449, -0.04303790628910065, -0.021531715989112854, 0.03479280322790146, 0.014320441521704197, -0.008060034364461899, 0.016873104497790337, 0.02500333823263645, 0.02335686981678009, 0.0388515368103981, 0.014779920689761639, 0.030223535373806953, -0.024850178509950638, 0.020727626979351044, -0.03762625902891159, 0.023420685902237892, -0.021633822470903397, 0.004294855985790491, 0.0836763083934784, -0.0018873754888772964, -0.03964286297559738, 0.019502349197864532, 0.03851969167590141, -0.07172983884811401, -0.01447360124439001, 0.0007917244802229106, -0.05906863138079643, 0.06080444157123566, -0.05212538689374924, 0.015992436558008194, -0.020114988088607788, -0.008710963651537895, -0.010836056433618069, 0.03706467151641846, 0.02014051377773285, -0.01946405880153179, 0.050057727843523026, -0.03369515761733055, 0.031882766634225845, 0.07269985228776932, -0.020778680220246315, -0.007447395473718643, 0.05289118364453316, -0.02058723010122776, -0.024799125269055367, 0.01245699729770422, 0.030785121023654938, 0.03295488283038139, 0.02853877656161785, 0.002846219576895237, -0.05702650174498558, 0.0019097112817689776, 0.014103464782238007, 0.026828492060303688, -0.0483219176530838, -0.0240971427410841, -0.02616479992866516, -0.021570006385445595, -0.014703340828418732, -0.003318462520837784, 0.06105970963835716, 0.01126362755894661, -0.028385616838932037, -0.03839205577969551, -0.003071173094213009, -0.004645847249776125, 0.005079800263047218, 0.019846957176923752, 0.0585070438683033, 0.020842496305704117, 0.02058723010122776, -0.043369751423597336, 0.0170900821685791, 0.023995036259293556, 0.0017756965244188905, 0.01285266038030386, 0.011608236469328403, -0.019655508920550346, -0.0045565040782094, -0.024403462186455727, -0.04038313403725624, 0.007294235751032829, -0.029815107583999634, -0.016043489798903465, -0.009425709955394268, 5.7684210332809016e-05, 0.05212538689374924, -0.0150479506701231, 0.04311448335647583, 0.005727538373321295, -0.009387419559061527, -0.03882601112127304, 0.02136579342186451, 0.023190947249531746, 0.022016720846295357, 0.026700859889388084, -0.040536295622587204, 0.007926019839942455, -0.014167281799018383, -0.0017756965244188905, -0.029074836522340775, 0.005733920261263847, -0.04270605742931366, -0.012227257713675499, 0.03313357010483742, 0.04242526739835739, 0.014039648696780205, -0.015494666993618011, 0.006994297727942467, -0.000639761274214834, -0.008340827189385891, 0.006455047521740198, 0.016362572088837624, -0.013707802630960941, 0.020651046186685562, 0.033005937933921814, -0.008008981123566628, 0.044518448412418365, 0.012897332198917866, -0.06320394575595856, 0.0039981091395020485, -0.0037013618275523186, 0.07244458794593811, 0.057945460081100464, -0.007626081816852093, 0.020038407295942307, 0.012648447416722775, -0.06218288093805313, 0.030708540230989456, -0.0006932076648809016, 0.030249061062932014, -0.015532956458628178, -0.024735307320952415, 0.008672674186527729, -0.005552042741328478, 0.014856500551104546, -0.010957307182252407, 0.04063840210437775, -0.012125151231884956, -0.00849398784339428, 0.04148077964782715, -0.06024285778403282, -0.01702626422047615, 0.011461458168923855, -0.005564806051552296, 0.035430967807769775, -0.00035617631510831416, 0.018340885639190674, -0.034894909709692, -0.05153827369213104, 0.0690750703215599, -0.034690696746110916, 0.014371494762599468, -0.05289118364453316, -0.025960586965084076, 0.022425148636102676, 0.03356752172112465, -0.04760717228055, -0.0426805317401886, 0.00143427774310112, 0.005635004490613937, 0.009534197859466076, 0.0657055526971817, 0.06652240455150604, 0.0026340296026319265, -0.027568764984607697, 0.010044730268418789, 0.0066688330844044685, 0.005619050469249487, 0.03172960504889488, 0.0024617246817797422, 0.04007681459188461, -0.021148815751075745, -0.013720565475523472, -0.03091275319457054, -0.011359351687133312, 0.056822288781404495, 0.010580789297819138, 0.009623540565371513, 0.0050447010435163975, 0.0066688330844044685, 0.020319201052188873, -0.021850798279047012, 0.002104351995512843, -0.031040387228131294, 0.06105970963835716, 0.037830471992492676, 0.009183206595480442, 0.025156497955322266, 0.003685407806187868, -0.014486365020275116, 0.010714804753661156, 0.013924778439104557, 0.002587762428447604, 0.007587791886180639, -0.014945844188332558, 0.029968267306685448, -0.027466658502817154, 0.040561821311712265, 0.016439152881503105, -0.01942576840519905, 0.004463970195502043, -0.0413786731660366

Dimensions: 1536

OpenAI’s embedding models return:

  • 1536-dimensional vectors for text-embedding-3-small model.

  • 3072-dimensional vectors for text-embedding-3-large model.

You can reduce the size by setting the dimensions parameter without losing core meaning.


Step-04: Positional Encoding

While vector embeddings help GPT understand the meaning of words, they don’t tell the model where a word appears in a sentence. That’s where positional encoding comes in.

Since GPT doesn’t process language in order like humans do as, it sees everything as a set of vectors. It needs a way to understand the position of each word. Positional encoding adds unique patterns to each word’s embedding based on its position in the sentence.

These patterns are generated using mathematical functions like sine and cosine. This way, even if two words have similar meanings, GPT can tell which one comes first, second, third, and so on.

For example, in the sentence “The quick brown fox jumps over the lazy Dog.”

  • Without positional encoding:
    All word embeddings are like floating bubbles but GPT doesn’t know which came first.

  • With positional encoding:
    Each bubble now has a label: position 0, position 1, position 2...
    The model can now understand that “quick” comes before “fox,” and “lazy” modifies “dog.”


The Secret Sauce – Self-Attention & Multi-Head Attention:

  1. Self-Attention: Understanding What Matters Most

    Self-attention allows GPT to look at every word in the sentence at once and decide which words are most important for understanding each word's meaning. It’s like giving each word a spotlight to highlight the words it should "pay attention" to.

    Example:
    In “The quick brown fox jumps over the lazy dog,” when processing the word “jumps,” self-attention helps GPT focus on “fox” (who's doing the jumping) and “dog” (who's being jumped over), rather than giving equal weight to every word like “the” or “over.”

    This mechanism helps GPT understand relationships, roles, and structure, no matter where the words appear in the sentence.

  2. Multi-Head Attention: Multiple Perspectives at Once

    Now imagine not just one attention mechanism but there are many running in parallel. That’s multi-head attention.

    Each head looks at the sentence from a different angle. As, one might focus on grammatical structure, another on meaning, another on subject-object relationships. Then, all these "perspectives" are combined, giving GPT a deep, comprehensive understanding of the sentence.

    Back to our example:

    • One head might connect “fox” with “jumps.”

    • Another might link “lazy” to “dog.”

    • Another might understand that “quick” describes “fox.”

Together, they help GPT build a rich, layered understanding of the sentence by capturing grammar, meaning, and context all at once.


The Two Phases of GPT’s Life

  1. Training Phase: Learning the Language of the Universe

    In this phase, GPT is like a curious student soaking in knowledge from billions of text sources—books, articles, websites, code, conversations, and more. Its goal? Just to master the art of predicting the next word in a sentence.

    But how does it learn?
    Through a process called Backpropagation. It’s a kind of feedback loop.
    Every time GPT guesses the next word, it checks how close (or far) it was from the correct one. If it’s wrong, it adjusts the “weights” in its neural network slightly just like correcting its intuition. This happens millions of times across endless examples, gradually making GPT smarter with each step.

    Think of it like studying: make a mistake, learn from it, and try again and again, until you almost always get it right.

  2. Testing Phase: Time to Shine

    Once training is complete, GPT is ready to respond.

    You give it a prompt—a question, a sentence, or any input—and GPT uses everything it has learned to generate a meaningful response.

    It doesn’t search the internet or copy answers. Instead, it uses the patterns and knowledge it absorbed during training to create a response that fits your input.

    This is the phase where GPT applies what it knows to help, inform, or complete tasks in real time.


Bringing It All together:

GPT is the smart engine behind Chat AI Models like ChatGPT, Claude, and Gemini or others, making them feel like the smartest friend you’ve ever had. It breaks down your words into tiny pieces, understands what they mean, and figures out what should come next. All thanks to being trained on tons of books, websites, and code.

Whether it's writing a story, helping with homework, or building an app, GPT’s transformer magic makes it all feel easy.

So next time you chat with an AI, remember: it’s not just clever replies but it’s a powerful mix of language, learning, and a lot of tech quietly working together to turn your ideas into something amazing.


References:

0
Subscribe to my newsletter

Read articles from Shreyansh Pandit directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Shreyansh Pandit
Shreyansh Pandit