RAG Indexing Explained

Introduction

Retrieval-Augmented Generation (RAG) helps AI systems provide better answers by combining their knowledge with external information. Indexing is a key part of this—it organizes the data so AI can find and use it effectively. This guide explains RAG indexing in simple terms and offers clear optimization tips.

What Is RAG Indexing?

RAG Indexing is the process of organizing data so it can be efficiently searched and retrieved as relevant context for the LLMs. Think of it like setting up a smart library for your AI. It prepares and organizes information so it can be found quickly in realtime to answer user questions.

Why Is Indexing Important?

Speed: Helps AI respond faster.
Accuracy: Ensures responses are relevant.
Efficiency: Reduces computing costs and effort. Because of the context window limitation, we need to optimize our context.

Basic Steps in RAG Indexing

Data Loading: Gather data from websites, documents, databases, etc.
Document Chunking: Break documents into smaller, meaningful parts called "chunks." This helps the AI to process data within its token limits (Context Window).
Embedding: Convert each chunk into a numeric vector using an embedding model. These vectors capture semantic meaning for similarity search.
Vector Storage: Save the vectors in a vector database (Like: Pinecone, Qdrant, etc) for fast and efficient similarity search.
User Query Resolution:
- The user’s query is also converted into a vector using the same embedding model.
- This query vector is then compared against the stored vectors in the vector Database (Similarity Search)
- The system retrieves the most semantically similar content.
- This content is combined with original user query and sent to the LLM.
- The LLM then generate a contextually accurate and informative response based on both the query and the retrieved data.

Indexing Challenges and Optimizations

Content Quality & Preprocessing

Clean, standardize and preprocess content before indexing.
Remove outdated or irrelevant information to improve retrieval and avoid noisy responses.

Metadata Handling

Store metadata such as file origin, chunk positions, timestamps and topic tags.
Capture relationships between chunks to allow richer, more relevant retrieval.

Optimizing Chunking Techniques

Fixed-Size: Useful for predictable, structured data but may loose semantic meaning.
Semantic: Break text by sentences or paragraph for more natural information flow.
Overlapping: Create context continuity between chunks by overlapping them slightly.
Small2Big: Start with fine-grained chunks like sentences, then aggregate nearby ones during retrieval.

Chunk Size Optimization

Adjusting chunk size to balance between retaining context and staying within LLM token limits.

Too large: Might dilute import content with irrelevant details and hence hallucinations.
Too small: Can loose essential context or meaning
Optimal size: It is hit and trial, should be determined by testing based on specific token types and AI use-case.

Implementation with LangChain

This section shows how to build a RAG system using LangChain, OpenAI, and Qdrant.

PDF Indexing Implementation

The indexPDF function handles the process of loading a PDF, splitting it into chunks, generating embeddings, and storing them in Qdrant:

import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { OpenAIEmbeddings } from "@langchain/openai";
import { QdrantVectorStore } from "@langchain/qdrant";

export async function indexPDF(pdfPath: string, collectionName: string) {
  try {
    // 1. Load the PDF document
    const loader = new PDFLoader(pdfPath);
    const docs = await loader.load();

    // 2. Split the document into chunks
    const textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 600,    // Each chunk will contain ~600 tokens
      chunkOverlap: 80,  // Overlap between chunks to maintain context
    });
    const splitDocs = await textSplitter.splitDocuments(docs);

    // 3. Initialize OpenAI embeddings
    const embedder = new OpenAIEmbeddings({
      apiKey: process.env.OPENAI_API_KEY,
      model: "text-embedding-3-large",
    });

    // 4. Store documents in Qdrant
    const vectorStore = await QdrantVectorStore.fromDocuments(
      splitDocs,
      embedder,
      {
        url: process.env.QDRANT_URL,
        collectionName: collectionName,
      }
    );

    console.log(`Successfully indexed PDF to collection: ${collectionName}`);
  } catch (error) {
    console.error("Error indexing PDF:", error);
    throw error;
  }
}

Query Implementation

The queryCollection function handles retrieving relevant content and generating responses:

import { OpenAIEmbeddings } from "@langchain/openai";
import OpenAI from "openai";
import { QdrantVectorStore } from "@langchain/qdrant";

export async function queryCollection(userQuery: string, collectionName: string) {
  try {
    // 1. Initialize embeddings
    const embedder = new OpenAIEmbeddings({
      apiKey: process.env.OPENAI_API_KEY,
      model: "text-embedding-3-large",
    });

    // 2. Connect to existing Qdrant collection
    const vectorStore = await QdrantVectorStore.fromExistingCollection(
      embedder,
      {
        url: process.env.QDRANT_URL,
        collectionName: collectionName,
      }
    );

    // 3. Perform similarity search
    const relevantDocs = await vectorStore.similaritySearch(userQuery);
    const context = relevantDocs.map((doc) => doc.pageContent).join("\n");

    // 4. Generate response using OpenAI
    const SYSTEM_PROMPT = `
      You are a helpful AI Assistant who responds to user queries based on the available context.
      Context:
      ${context}
    `;

    const openai = new OpenAI({
      apiKey: process.env.OPENAI_API_KEY!,
    });

    // 5. Get completion from OpenAI
    const response = await openai.chat.completions.create({
      model: "gpt-4",
      messages: [
        { role: "system", content: SYSTEM_PROMPT },
        { role: "user", content: userQuery },
      ],
    });

    return response.choices[0].message.content;
  } catch (error) {
    console.error("Error querying collection:", error);
    throw error;
  }
}

For a complete working example, including all the code snippets shared above, visit my GitHub repository here: Github Repo

Conclusion

RAG indexing is the backbone of smart, responsive AI. With the right strategies—like smart chunking, optimized size, metadata handling, and content quality control—your AI will deliver fast, accurate, and relevant information every time.

Understanding RAG Indexing

Table of contents