Overview

So, RAG stands for Retrieval‑Augmented Generation. Basically, it’s a process of injecting external data into a prompt to get the desired response. External data could come from a database, the web, API calls, local files, etc.

What to do

Injection Phase

Accept user pdf
Make vector Embeddings
Store it in a vector DB

Retrieval Phase

Accept user query
Make vector Embeddings
Search in vector DB and Retrieve Relevant Chunks

Now, based on this relevant chunks + user query, we ask the LLM, and return its response back to user. That’s a basic typical RAG. You can choose any embeddings model, LLM’s and vector stores you prefer.

How to do

Loader

from langchain_community.document_loaders import PyPDFLoader

file_path = "./public/file_nane.pdf"
loader = PyPDFLoader(file_path)
docs = loader.load()

Chunking

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_docs = text_splitter.split_documents(loader.load())

Make vector Embeddings & Store it in a vector DB

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_qdrant import QdrantVectorStore

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

qdrant = QdrantVectorStore.from_documents(
    documents=[], # empty docoment for first time. It will fill the embeddings here
    embedding=embeddings,
    collection_name="learning-genai",
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
)

qdrant.add_documents(documents=split_docs)

retriver = QdrantVectorStore.from_existing_collection(
    embedding=embeddings,
    collection_name="learning-genai",
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
)

Search in vector DB and Retrieve Relevant Chunks

from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate

user_query = "take the user query as input"

relevent_chunks = retriver.similarity_search(query = user_query)
# for res in relevent_chunks:
#     print(f"{res.page_content} [{res.metadata}]")

llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0,
)

prompt_template = ChatPromptTemplate([
    ("system", "you are a helpful AI asistant who responds based on the availabe context. {context}"),
    ("user", "give a short and crisp answer of this query {query}")
])

prompt = prompt_template.invoke({"context": relevent_chunks, "query": user_query})
response = llm.invoke(prompt)
print(response.content)

💡

If you want to improve accuracy, there are various RAG methods like ranking, query decomposition, HyDE (Hypothetical Document Embeddings) etc. that you can explore. If you’d like to get in touch, here’s my portfolio.

Basic RAG for PDF chat - Short & Crisp

Overview

What to do

How to do

Subscribe to my newsletter

Satyajit Patel

Satyajit Patel