Introduction

Knowledge of LLM is limited to data its trained. if we wants llm to answer question based on our own data then it is a challenge due to limited context and resources. But to use llm to solve real problems we some how needs to give it content or knowledge of our domain. In this article this is exactly we will look upon, how to provide our own large data set to LLM so that its able to answer based on that but keep tokens in limit.

Context to LLM

we all know that how to give context to llm. if you don’t then in simple terms we just give our data as system prompt and llm knows we are referring to what. but the main problem is how to give large data set.

One way to is to divide our data into small chunks and just provide relevant data when ever user queries something. Then question may arrive how will we Handel this large data and more importantly how to retrieve this data based on user query. we can do this by storing data chunks in vector db and searching vector similarity search based on user Query.

Vector DB

Vector Db is database which stores vectors of data, its useful to search data based on semantic search, we can use this to store our data chunks and then retrieving it later when we need it and give it to llm as context. Note that we will not give embedding to llm we will give data linked with this vector embedding.

process

The flow of creating RAG AI has two parts, 1st creating context and embedding, 2nd. retrieving and giving it to llm. lets start with First one.

To create a embedding of data we first have to divide large data into small chunks, because as everything in AI world have limit. The limit to text we can create embedding is dependent of model we are using to create embedding, so its best to create chunks and also what purpose of creating and storing data in vector db if we don’t chunks data.

Chunking and storing

To simplify chunking Data we will use lang chain document loader and text splting

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

file_path = "/home/shubham/Downloads/nodejs.pdf"
loader = PyPDFLoader(file_path)

docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(docs)

this gives use chunks of data along with metadata we will store in vector db
we are using chunks overlap because when we divide data some of context may miss form previous data so to avoid this we are using overlap.

Next we will create vector embedding and store this in vector db

from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import QdrantVectorStore

embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-large"
)

vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    url="http://vector-db:6333",
    collection_name="learning_vectors",
    embedding=embedding_model
)

we used openai embedding you can use any model
we have used Qdrant db this too can be used based on requirements.

Now that we have stored our data in vector Db its time to use it

Retrieval

To retrieve and use data based on user Query we will first search relevant data based on query and then give it as context to llm

first we have to create vector embedding of the use user query, or better we can ask llm to create sub queries form this and then create vector embeddings. we then perform vector similarity search from this embedding and retrieve relevant data and give it as context to llm. in this way we just give relevant data from our original document. so that we are within context limit and answer user question properly.

from dotenv import load_dotenv
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from openai import OpenAI

load_dotenv()

client = OpenAI()

# Vector Embeddings
embedding_model = OpenAIEmbeddings(
    model="text-embedding-3-large"
)

vector_db = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="learning_vectors",
    embedding=embedding_model
)

# Take User Query
query = input("> ")

# Vector Similarity Search [query] in DB
search_results = vector_db.similarity_search(
    query=query
)

context = "\n\n\n".join([f"Page Content: {result.page_content}\nPage Number: {result.metadata['page_label']}\nFile Location: {result.metadata['source']}" for result in search_results])

SYSTEM_PROMPT = f"""
    You are a helpfull AI Assistant who asnweres user query based on the available context
    retrieved from a PDF file along with page_contents and page number.

    You should only ans the user based on the following context and navigate the user
    to open the right page number to know more.

    Context:
    {context}
"""

chat_completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        { "role": "system", "content": SYSTEM_PROMPT },
        { "role": "user", "content": query },
    ]
)

print(f"🤖: {chat_completion.choices[0].message.content}")

Conclusion

We have successfully create a RAG AI agent, we can not store large data and retrieve relevant data based on user query.

RAG Working