How AI Agents Work: Chunking, Embedding, and Indexing Explained

What is an AI Agent?

The AI agent is a tool that helps the user to get the answer/info directly what he is seeking.

If we just talk in the context of coding only before the AI come coding is like a open book exam where writing code is like answering the question in the test and if you stuck you just open the book or go to internet and get the answer but even its a open book exam you must atleast have a basic idea where to look and the available answer is suit for you code you not its like you look for an Math query and you checking the chemistry answer.

What change after AI is that we all get our own guider who directly gives us the answer and relevant links to where he gets the answer for reference.

So we can say that AI Agent is a more of our personal guider which we can use in any way, anytime we want.

Workflow of AI Agents:-

So we see we can use the AI Agent at many places and more of a guider now, let’s see how the guider does things behind to help us.

Lets go with the example of the Open Book Test, and we user Our Guider AI Agent to clear the test.

So we Stuck in some questions and Now We need the help of our Guider to get the answer for the relevant question.

So the first question was whether our Guider needed to look at the Whole Book?

The answer is no as book may be very long and its just a example of book the data can be very long so machine/ AI Agent cant look whole data at once so there is limited size it can look at a time and that called Context Window and its depends on LLM some have big window and some has small

So the question may come now that the data is always more than the context window, so how can we store and get the answer?

The answer is we divided the book into small chunks, let’s say pages or more small (put 1000 words at a time together into the list), we do not need to always look through the whole data to get the answer

After dividing the Big PDF into small chunks, we pass the relevant chunks into the context window, and LLM can do the process

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

#put the file path of the File //
file_path = "./example.pdf" 

#load the file by using the PyPDFLoader //

loader = PyPDFLoader(file_path=file_path)
 docs = []

print("LOADER", loader)
// Loads the file
docs_lazy = loader.lazy_load()

for doc in docs_lazy:
    docs.append(doc)
#Here we do the Choose the Textsplitter which help us to do the chunking and we give what 
#is the size of our chunk and we overlap the chunk as the data not lost when the chunk end
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

#Now we the splitter we make and made chunk of the docs we have on the basis of text
text = text_splitter.split_documents(documents=docs)

Why require to do Embedding:-

Now we do the chunking or divide the data into small parts, you think now just put it into the vector database and we're good to go. But that’s not true as it is like we just put the text into the database and if the query comes it’s not possible to get the relevant data from the database.

Take an example like Translation from English to Your Language

“The sun sets in the west”

Can its make sense if you translate the things word by word and say it may be, but its rare, most of the time it’s not you have to find the relation and semantic meaning for the words and how they are related to translate it well.

I don’t like, grammar, and i have a sensei who always corrects me.

So like in translation, we put the word based on its relation we put the chunking data in a way that the same meaning of words are closer to each other in the vector db so its easy and get the relevant data easily for the query.

Do not worry, this is all done by the LLM.

So far, we take a big file —> Divide the file into small chunks —> Do the Embedding of the chunks and put in the Vector Db.

from langchain_google_genai import GoogleGenerativeAIEmbeddings
# use the QdrantVectore Store to store the vector embedding
from langchain_qdrant import QdrantVectorStore 

# Here we select the GoogleAi Embedder to make the embedding of our docs
embedder = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004",
    google_api_key="Your Api key",
)

# Now we make a vectore store

vector_store = QdrantVectorStore.from_documents(
     documents=[],
# The Url to which to connect
     url="http://localhost:6333",
# The collection name we have in our store in which we put all our embedding
     collection_name="learning_langchain",
#Give the embedder we make above 
     embedding=embedder,
 )

# Now we have our store now lets put the doc in it we have and its make all the embedding and 
# All by itslef automatically
vector_store.add_documents(documents=text)

Retrieval & Generation -

When the user asks a query/question, the agent:-

When user asks a que/query, we can directly search the query in our database as user can ask anything, and its very hard and maybe the data we got is not fully correct to find what the user is asking about, so for this.

We convert the user query into an embedding.
Then we get the embedding of the query and search into our Vector Database for the relevant meaning word or similar semantic meaning.
After getting the relevant semantic meaning, we pass the ( relevant data + user query ) to our LLM and we got the result.

from langchain_qdrant import QdrantVectorStore


# This we make to get the revelant data from our store
retriver = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="learning_langchain",
    embedding=embedder,
)

#Query give By the user
query = input("Ask the Doubt:=> ")

# We put the query into our message array
message.append({"role": "user", "content": query})

# This will Find the revelant data on the basis of out query form the Database
search_result = retriver.similarity_search(query)

# Here we put the data we got from the retriver and put in message so our LLM have revelant data
# and query to give us a better answer
message.append({"role": "system", "content": search_result[0].page_content})

# Now just give the message to our LLM 
res = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=message,
    response_format={"type": "json_object"},
)

# Here We see the query Result from Our LLM
print(res.choices[0].message.content)

Connect With me:-

Github:- https://github.com/MrBlackGhostt

LinkedIn :- https://www.linkedin.com/in/mrhemantkumarr/ Youtube

X(Twitter) :- https://x.com/hemant_x_bolt

Discord:- https://discordapp.com/users/810562263928406016

Understanding How AI Agents Work?

Table of contents