Understanding RAG Indexing


Introduction
Retrieval-Augmented Generation (RAG) helps AI systems provide better answers by combining their knowledge with external information. Indexing is a key part of this—it organizes the data so AI can find and use it effectively. This guide explains RAG indexing in simple terms and offers clear optimization tips.
What Is RAG Indexing?
RAG Indexing is the process of organizing data so it can be efficiently searched and retrieved as relevant context for the LLMs. Think of it like setting up a smart library for your AI. It prepares and organizes information so it can be found quickly in realtime to answer user questions.
Why Is Indexing Important?
Speed: Helps AI respond faster.
Accuracy: Ensures responses are relevant.
Efficiency: Reduces computing costs and effort. Because of the context window limitation, we need to optimize our context.
Basic Steps in RAG Indexing
Data Loading: Gather data from websites, documents, databases, etc.
Document Chunking: Break documents into smaller, meaningful parts called "chunks." This helps the AI to process data within its token limits (Context Window).
Embedding: Convert each chunk into a numeric vector using an embedding model. These vectors capture semantic meaning for similarity search.
Vector Storage: Save the vectors in a vector database (Like: Pinecone, Qdrant, etc) for fast and efficient similarity search.
User Query Resolution:
The user’s query is also converted into a vector using the same embedding model.
This query vector is then compared against the stored vectors in the vector Database (Similarity Search)
The system retrieves the most semantically similar content.
This content is combined with original user query and sent to the LLM.
The LLM then generate a contextually accurate and informative response based on both the query and the retrieved data.
Indexing Challenges and Optimizations
Content Quality & Preprocessing
Clean, standardize and preprocess content before indexing.
Remove outdated or irrelevant information to improve retrieval and avoid noisy responses.
Metadata Handling
Store metadata such as file origin, chunk positions, timestamps and topic tags.
Capture relationships between chunks to allow richer, more relevant retrieval.
Optimizing Chunking Techniques
Fixed-Size: Useful for predictable, structured data but may loose semantic meaning.
Semantic: Break text by sentences or paragraph for more natural information flow.
Overlapping: Create context continuity between chunks by overlapping them slightly.
Small2Big: Start with fine-grained chunks like sentences, then aggregate nearby ones during retrieval.
Chunk Size Optimization
Adjusting chunk size to balance between retaining context and staying within LLM token limits.
Too large: Might dilute import content with irrelevant details and hence hallucinations.
Too small: Can loose essential context or meaning
Optimal size: It is hit and trial, should be determined by testing based on specific token types and AI use-case.
Implementation with LangChain
This section shows how to build a RAG system using LangChain
, OpenAI
, and Qdrant
.
PDF Indexing Implementation
The indexPDF
function handles the process of loading a PDF, splitting it into chunks, generating embeddings, and storing them in Qdrant
:
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { OpenAIEmbeddings } from "@langchain/openai";
import { QdrantVectorStore } from "@langchain/qdrant";
export async function indexPDF(pdfPath: string, collectionName: string) {
try {
// 1. Load the PDF document
const loader = new PDFLoader(pdfPath);
const docs = await loader.load();
// 2. Split the document into chunks
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 600, // Each chunk will contain ~600 tokens
chunkOverlap: 80, // Overlap between chunks to maintain context
});
const splitDocs = await textSplitter.splitDocuments(docs);
// 3. Initialize OpenAI embeddings
const embedder = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: "text-embedding-3-large",
});
// 4. Store documents in Qdrant
const vectorStore = await QdrantVectorStore.fromDocuments(
splitDocs,
embedder,
{
url: process.env.QDRANT_URL,
collectionName: collectionName,
}
);
console.log(`Successfully indexed PDF to collection: ${collectionName}`);
} catch (error) {
console.error("Error indexing PDF:", error);
throw error;
}
}
Query Implementation
The queryCollection
function handles retrieving relevant content and generating responses:
import { OpenAIEmbeddings } from "@langchain/openai";
import OpenAI from "openai";
import { QdrantVectorStore } from "@langchain/qdrant";
export async function queryCollection(userQuery: string, collectionName: string) {
try {
// 1. Initialize embeddings
const embedder = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
model: "text-embedding-3-large",
});
// 2. Connect to existing Qdrant collection
const vectorStore = await QdrantVectorStore.fromExistingCollection(
embedder,
{
url: process.env.QDRANT_URL,
collectionName: collectionName,
}
);
// 3. Perform similarity search
const relevantDocs = await vectorStore.similaritySearch(userQuery);
const context = relevantDocs.map((doc) => doc.pageContent).join("\n");
// 4. Generate response using OpenAI
const SYSTEM_PROMPT = `
You are a helpful AI Assistant who responds to user queries based on the available context.
Context:
${context}
`;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
// 5. Get completion from OpenAI
const response = await openai.chat.completions.create({
model: "gpt-4",
messages: [
{ role: "system", content: SYSTEM_PROMPT },
{ role: "user", content: userQuery },
],
});
return response.choices[0].message.content;
} catch (error) {
console.error("Error querying collection:", error);
throw error;
}
}
For a complete working example, including all the code snippets shared above, visit my GitHub repository here: Github Repo
Conclusion
RAG indexing is the backbone of smart, responsive AI. With the right strategies—like smart chunking, optimized size, metadata handling, and content quality control—your AI will deliver fast, accurate, and relevant information every time.
Subscribe to my newsletter
Read articles from Haseeb Yousuf directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Haseeb Yousuf
Haseeb Yousuf
React-ive by nature, always in dev-elopment!