Generative AI

What is Generative AI ?

Generative AI (GenAI) refers to a subset of artificial intelligence that focuses on generating new content such as text, images, audio, video or code. It “generates” new data based on the past experience or pattern it learned from existing data or to generate something on the basis of pre-trained data.

Misconception

A most common misconception is that you need deep knowledge in Math, Statistics, Linear equation, Regression, Probability, Calculus, Vector or Matrix to work on GenAI but that’s not entirely true.

While these concepts form the foundation of how GenAI models are built and trained, you don’t need to master them to use or apply GenAI effectively.

What really matters is your ability to :

“Understand how to frame a problem for an AI solution”
“Practice prompt engineering to guide AI behavior”
“Integrate AI tools into real-world workflows or products”

NOTE :- GenAI is no longer just for data scientists, it’s for everyone who’s curious and ready to innovate.

History

Generative AI is not developed by single company. It’s a field of research that evolved over decades, built on contribution from tech-giants and open-source communities.

However some companies are pioneers and leaders in making Generative AI accessible and mainstream :-

OpenAI
Google Deepmind
MetaAI
Microsoft and lot mores..

So, OpenAI is the key player behind GPT and ChatGPT.

What is GPT ?

GPT stands for “Generative Pre-Trained Transformer”. It’s a large language model developed by OpenAI based in San Francisco. Initially it was designed to generate human-like text.

Understand and answer questions
Generate essays, code, stories, emails, etc.
Translate languages, summarize content, and more

Transformer

A Transformer is generative in nature that is which generate something on basis of pre-trained data. So basically transformer is something that transform that is take input and then perform some actions and transform into output.

In a Transformer model, the system takes an input and predicts the next word (or token) based on the context.

In above example, the input is “My name is Twinkle”, the model might predict the next word as “Goyal”.
It then appends this predicted word to the input — making it “My name is Twinkle Goyal” — and continues predicting the next word, step by step.

In the context of Generative AI, the Transformer architecture enables models like GPT to understand context, learn relationships between words, and generate human-like responses.

So while the term “Transformer” might suggest a simple input-to-output conversion, this process of sequential prediction and generation is how the model constructs coherent sentences, stories, code, and more; one token at a time, based on learned patterns from its training data.

Reason why GPU’s are most demanding

GPUs are in high demand because Generative AI models require massive parallel processing to perform complex calculations quickly.
Predicting the next token involves billions of matrix operations and GPUs are optimised to handle this kind of compute-intensive workload efficiently.

Note :- Google researchers introduced the Transformer architecture in 2017 in a landmark paper titled “Attention is All You Need”. Later it became the foundation for modern large language models like GPT, BERT, T5, PaLM, LLaMA, Claude, and others. It massively improved training speed, scalability, and parallelism

In the original Transformer model, “Hello” could be translated to “Namaste” by learning language mappings.
Later, other AI models built on this architecture by adding new algorithms. Enabling it to not just translate, but also predict the next token for tasks like text generation, summarization, and conversation.

Step’s include while transform input

Tokenization

It is process of breaking input into a smaller unit called tokens so that the language model like (GPT) can understand it and process it further. It’s a crucial step of breaking down text into smaller/manageable units called token.

The numerical representation enables AI to analyse text, identify patterns and generate human like responses.

Let’s review the code for tokenization. Below is the Python code that encodes normal text into tokens, which can then be analyzed by the AI system.

Output :-

Tokens : [13225, 11, 5922, 0, 357, 939, 7077, 47817, 499, 11616]
Decoded Text : 'Hello, World! I am Twinkle Goyal'

In AI's tokenization, key vocabulary includes

Tokens :- (the basic units words, subwords, or characters)
Tokenization :- (the process of breaking text into these units)
Tokenizers :- (the algorithms that perform tokenization, like Byte Pair Encoding or WordPiece)
Vocabulary :- (the set of all unique tokens a model knows)
Corpus :- (the large body of text from which the vocabulary is built).

Tokenizers use these vocabulary items to convert text into a numerical format, enabling AI models to understand and process human language.

Website Links :-

Tiktokenizer

Vector Embeddings

Vector embeddings are numerical representation of data that capture semantic relationships and similarities, transforming complex data like text or image into a format suitable for machine learning models.

It is simple a vector database which is used to provide high-quality search results.

Semantic searches (as well as question answering) are searches by similarity, such as by the meaning of text, or by what objects are contained in images.

For example, consider a library of wine names and descriptions, one of which mentioning that the wine is “good with fish”. A “wine for seafood” keyword search, or even a synonym search, won’t find that wine. A meaning-based search should understand that “fish” is similar to “seafood”, and “good with X” means the wine is “for X”—and should find the wine.

Vector embeddings example

For example, in the case of text data, “cat” and “kitty” have similar meaning, even though the words “cat” and “kitty” are very different if compared letter by letter. For semantic search to work effectively, embedding representations of “cat” and “kitty” must sufficiently capture their semantic similarity. This is where vector representations are used, and why their derivation is so important.

In practice, vector embeddings are arrays of real numbers, of a fixed length (typically from hundreds to thousands of elements), generated by machine learning models. The process of generating a vector for a data object is called vectorization. Weaviate generates vector embeddings using integrations with model providers (OpenAI, Cohere, Google PaLM etc.), and conveniently stores both objects and vector embeddings in the same database. For example, vectorizing the two words above might result in the following word embeddings:

cat = [1.5, -0.4, 7.2, 19.6, 3.1, ..., 20.2]

kitty = [1.5, -0.4, 7.2, 19.5, 3.2, ..., 20.8]

These two vectors have a very high similarity. In contrast, vectors for “banjo” or “comedy” would not be very similar to either of these vectors. To this extent, vectors capture the semantic similarity of words.

Let’s review the code for vector embedding. Below is the Python code to get the embedding vectors (list of floating point numbers) along with some additional metadata. You can extract the embedding vector, save it in a vector database, and use for many different use cases.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Website link :-

Embedding projector

Vector Embedding

Vector-Embeddings-Explained

Positional Encoding

A positional embedding is similar to a word embedding. It is used to indexing the tokens in the input token array. Positional encoding is the technique that adds information about the position of each token in the sequence to the input embeddings. This helps transformers to understand the relative or absolute position of tokens which is important for differentiating between words in different positions and capturing the structure of a sentence. Without positional encoding, transformers would struggle to process sequential data effectively.

Basic sentence :- "The cat sat on the mat."

Before the sentence is given to the Transformer model it get tokenised where each word is converted into the token. Let assume the tokens for this sentence are : ["The", "cat" , "sat", "on", "the" ,"mat"]

After that each token is mapped to a high-dimensional vector representation through an embedding layer. These embeddings encode semantic information about the words in the sentence. However they lack information about the order of the words.

Embeddings ={E1,E2,E3,E4,E5,E6E1,E2,E3,E4,E5,E6}

Importance of positional encoding

Contextual Understanding - In natural language meaning of a word depends on its position hence helping model to understand these differences.
Better Generalisation - It allows transformer models to handle input sequences of different lengths which makes them more flexible for tasks like document summarization or question answering.
Preventing Symmetry Issues - Without positional encoding it considers token as same which causes issues but by using positional encoding tokens at different positions are treated differently which improves model’s ability to capture long-range dependencies.

Self Attension

Self attention, helps the model to understand the relationship and dependencies between words or tokens. It helps in improving its ability to capture context and it’s meaning.

Example

The River Bank

The ICICI Bank

In “The river bank” bank refers to the sloping land that runs along the sides of a river, confining the water and separating it from the surrounding level ground.

In “The icici bank” bank refers to an organization which keeps money safely for its customers.

So the vector embedding for bank in both cases should be different based on the previous work or token. So this is called the self attention where relationships are made between tokens to generate proper embedding for the token.

Multiple-head Attension

Multiple-head attention, helps the model to understand the relationship and dependencies between words and token in multiple expects. it helps in improving it’s ability to capture context and it’s meaning more accurately.

This is just the beginning of exploring Generative AI with Python. In my upcoming blogs, I’ll dive deeper into practical implementations and advanced concepts. Stay tuned!
Thanks..

Generative AI Using Python

Table of contents