RAG: Retrieval-Augmented Generation

RAG (Retrieval-Augmented Generation) is a powerful technique in Generative AI that combines a retrieval system with a language model to generate more accurate, grounded, and up-to-date responses.

How RAG Works

Query Input
A user asks a question or gives a prompt.
Retriever (Search Component)
The system searches a document store (such as PDFs, websites, or databases) using vector similarity or keyword-based search (e.g., FAISS, Elasticsearch).
Retriever Output
Top k relevant documents are returned (e.g., paragraphs, chunks).
Generator (LLM)
The retrieved documents are combined with the query and sent to a language model (like GPT or LLaMA) to generate a response based on both the query and the retrieved context.

Why Use RAG?

Overcomes the knowledge cutoff of LLMs
Reduces hallucinations
Enables domain-specific answers (legal, medical, business, etc.)
Makes models more trustworthy and explainable

RAG Variants

Standard RAG – simple retrieval + LLM
RAG-Fusion – merges multiple retrievers
Multi-hop RAG – reasoning across numerous documents
Conversational RAG – context-aware over chat history

Two-Step Process in RAG: Indexing and Retrieving

Retrieval-Augmented Generation (RAG) relies on a two-step process to fetch relevant external knowledge before generating a response. The two key steps are:

1. Indexing (Preprocessing Phase)

This is a one-time or periodic process where your data is prepared and stored for efficient retrieval.

Steps in Indexing:

Chunking: Break large documents (e.g., PDFs, blogs, reports) into smaller chunks (like 200–500 words).
Embedding: Convert each chunk into a high-dimensional vector using an embedding model (e.g., OpenAI, HuggingFace, BGE).
Storing: Store the vectors and their corresponding text chunks in a vector database (e.g., FAISS, Chroma, Pinecone).

Example:

Text: "The mitochondria are the powerhouse of the cell."
→ Embedding → Stored as a vector in FAISS with reference to the original text.

2. Retrieving (Query-Time Phase)

This step happens every time a user asks a question.

Steps in Retrieval:

Embed the query using the same embedding model.
Search the vector DB for the most similar chunks using vector similarity (like cosine similarity).
Return the Top-k relevant chunks to feed into the language model along with the original question.

Example:

Query: "What is the function of mitochondria?"
→ Embedding → Compare against stored vectors → Retrieve the chunk above
→ Send it to the LLM for grounded response generation.

Basic RAG structure :

LangChain Introduction

LangChain is a powerful open-source framework designed to help developers build applications powered by large language models (LLMs). It simplifies the process of integrating LLMs with external data sources, memory, tools, and workflows like RAG (Retrieval-Augmented Generation).

LangChain is especially useful for building:

Custom chatbots
PDF Q&A assistants
RAG pipelines
Agents that can browse, code, or search

Core Concepts in LangChain

Concept	Purpose
LLMs	Connects to models like GPT, Claude, or LLaMA
Prompts	Templates and input formatting for LLMs
Chains	Sequences of LLM calls (e.g., input → prompt → output)
Agents	LLMs that decide which tools to use step-by-step
Retrievers	Get relevant context from vector stores
Tools	Interfaces for search, APIs, file systems, etc.

LangChain PDF Chatbot – Basic Workflow

1. Install Required Packages

pip install langchain openai faiss-cpu pypdf tiktoken

2. Code Overview

from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# 1. Load the PDF
loader = PyPDFLoader("your_file.pdf")
documents = loader.load()

# 2. Split into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

# 3. Convert text chunks to embeddings
embeddings = OpenAIEmbeddings()  # You can use HuggingFaceEmbeddings instead

# 4. Store vectors in a vector database
vectorstore = FAISS.from_documents(docs, embeddings)

# 5. Create retriever and chain
retriever = vectorstore.as_retriever()
llm = ChatOpenAI(temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

# 6. Ask a question
query = "What are the key findings in the document?"
response = qa_chain.run(query)
print(response)

What Happens Behind the Scenes

PDF is chunked → each chunk is embedded → stored in FAISS.
Query is embedded → similar chunks are retrieved → LLM gets both query + relevant chunks → Generates an answer.

Understanding RAG: A Comprehensive Guide

How RAG Works

Why Use RAG?

RAG Variants

Two-Step Process in RAG: Indexing and Retrieving

1. Indexing (Preprocessing Phase)

Steps in Indexing:

Example:

2. Retrieving (Query-Time Phase)

Steps in Retrieval:

Example:

LangChain Introduction

Core Concepts in LangChain

LangChain PDF Chatbot – Basic Workflow

What Happens Behind the Scenes

Subscribe to my newsletter

somil

somil