Basic RAG for PDF chat - Short & Crisp

Satyajit PatelSatyajit Patel
2 min read

Overview

So, RAG stands for Retrieval‑Augmented Generation. Basically, it’s a process of injecting external data into a prompt to get the desired response. External data could come from a database, the web, API calls, local files, etc.

What to do

Injection Phase

  • Accept user pdf

  • Make vector Embeddings

  • Store it in a vector DB

Retrieval Phase

  • Accept user query

  • Make vector Embeddings

  • Search in vector DB and Retrieve Relevant Chunks

Now, based on this relevant chunks + user query, we ask the LLM, and return its response back to user. That’s a basic typical RAG. You can choose any embeddings model, LLM’s and vector stores you prefer.

How to do

Loader

from langchain_community.document_loaders import PyPDFLoader

file_path = "./public/file_nane.pdf"
loader = PyPDFLoader(file_path)
docs = loader.load()

Chunking

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
split_docs = text_splitter.split_documents(loader.load())

Make vector Embeddings & Store it in a vector DB

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_qdrant import QdrantVectorStore

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

qdrant = QdrantVectorStore.from_documents(
    documents=[], # empty docoment for first time. It will fill the embeddings here
    embedding=embeddings,
    collection_name="learning-genai",
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
)

qdrant.add_documents(documents=split_docs)

retriver = QdrantVectorStore.from_existing_collection(
    embedding=embeddings,
    collection_name="learning-genai",
    url=os.getenv("QDRANT_URL"),
    api_key=os.getenv("QDRANT_API_KEY"),
)

Search in vector DB and Retrieve Relevant Chunks

from langchain_groq import ChatGroq
from langchain_core.prompts import ChatPromptTemplate

user_query = "take the user query as input"

relevent_chunks = retriver.similarity_search(query = user_query)
# for res in relevent_chunks:
#     print(f"{res.page_content} [{res.metadata}]")

llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0,
)

prompt_template = ChatPromptTemplate([
    ("system", "you are a helpful AI asistant who responds based on the availabe context. {context}"),
    ("user", "give a short and crisp answer of this query {query}")
])

prompt = prompt_template.invoke({"context": relevent_chunks, "query": user_query})
response = llm.invoke(prompt)
print(response.content)

💡
If you want to improve accuracy, there are various RAG methods like ranking, query decomposition, HyDE (Hypothetical Document Embeddings) etc. that you can explore. If you’d like to get in touch, here’s my portfolio.
0
Subscribe to my newsletter

Read articles from Satyajit Patel directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Satyajit Patel
Satyajit Patel

Code Enthusiast | Curios to meet the unmet technologies