Step-by-Step Guide to Creating a Rag AI PDF Bot 🤖

Aarush GuptaAarush Gupta
5 min read

🧠 What if your chatbot could read PDFs and answer like ChatGPT ?

  • A RAG bot is an AI-powered chatbot that uses Retrieval-Augmented Generation (RAG) to answer questions more accurately by pulling information from external data sources — like PDFs, documents, or a knowledge base.

  • I built a RAG bot to deepen my understanding of Generative AI, as Retrieval-Augmented Generation is a pivotal technology that enhances real-world applications of AI and simplifies everyday tasks.

  • Tech-Used: Next.js , OpenAI API key , Qdrant Database for storing vector embeddings , docker etc.

⚙️ What is RAG (Retrieval-Augmented Generation)?

Retrieval-Augmented Generation (RAG) is a powerful AI architecture that combines two core capabilities:

  1. Retrieval of information from external knowledge sources

  2. Generation of responses using a Large Language Model (LLM) like GPT , Claude , sonnet etc.

🤖 Why is this important?

Most language models (like ChatGPT) generate answers based on the data they were trained on. This can lead to hallucinations or outdated responses when asked about specific or recent topics.

RAG fixes that by injecting fresh and factual information into the model's brain at runtime.

How to Evaluate Retrieval Augmented Generation (RAG) Systems

  1. User Input

    • The user enters a query into the app (e.g., a question about an uploaded PDF).
  2. Query Embedding & Retrieval

    • The app converts the user query into vector embeddings.

    • These embeddings are used to search the vector database (like Qdrant) for the most relevant content chunks.

  3. Chunk Retrieval

    • The app retrieves the top matching text chunks related to the user query from the database.

    • These chunks come from preprocessed documents (PDFs, text files, etc.) via chunking.

  4. Sending to LLM

    • The app sends both the query and the retrieved chunks to the LLM (e.g., OpenAI GPT).
  5. LLM Generates Response

    • The LLM processes the combined context (query + chunks) and generates a grounded, accurate answer.
  6. Final Output

    • The app receives the LLM’s answer and returns it to the user in the chat interface.

Implementation and Code Overview:

This is the overall structure of my project : There are basically 2 files only . Query and Upload.

1) PDF Upload & Text Extraction:

  • Use LangChain’s PDFLoader to extract text from uploaded PDFs.

  • Mention chunking logic: RecursiveCharacterTextSplitter.

const loader = new PDFLoader(filePath);
const rawDocs = await loader.load();

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const docs = await splitter.splitDocuments(rawDocs);

2) Embedding with OpenAI

  • Convert text chunks into embeddings using OpenAI:
const embeddings = new OpenAIEmbeddings();
const vectorStore = await QdrantVectorStore.fromDocuments(docs, embeddings, {
  client: qdrantClient,
  collectionName: 'rag_bot',
});

3) Storing in Vector DB (Qdrant)

  • I used a Docker Compose YAML file to spin up a local containerized instance of the Qdrant vector database, where all the document embeddings are stored and retrieved from.

  •   version: '3.8'
      services:
        qdrant:
          image: qdrant/qdrant
          ports:
            - "6335:6333" 
          volumes:
            - qdrant_storage:/qdrant/storage
    
      volumes:
        qdrant_storage:
    

4) Query embeddings and Chunk Retrieval

Once the documents are uploaded and stored as vector embeddings in Qdrant, the next step is to handle user queries and retrieve relevant context chunks from the vector database.

🧠 Embedding the Query:

When a user types a question, the first step is to convert the query into a vector embedding using the same embedding model (like OpenAIEmbeddings) that was used for the documents.

const embeddings = new OpenAIEmbeddings();
const vectorStore = await QdrantVectorStore.fromExistingCollection(embeddings, {
  client: qdrantClient,
  collectionName: 'rag_bot',
});

📥 Similarity Search (Chunk Retrieval):

We then use the vector representation of the query to perform a similarity search within the Qdrant collection. This returns the top-k document chunks that are most semantically similar to the query.

const relevantDocs = await vectorStore.similaritySearch(userQuery, 3);
  • userQuery: The question asked by the user

  • 3: Number of relevant chunks to retrieve (you can tune this based on your use case)

💬 d. Query Flow

Once the user submits a question, the backend follows these steps:

  1. Clean the Query

    • Trims and sanitizes the input for safety and token efficiency.
  2. Generate Embedding

    • Converts the query into a vector using text-embedding-ada-002.
  3. Search Qdrant DB

    • Retrieves top 3 relevant chunks from the specified collection.
  4. Format Context

    • Combines retrieved chunks into a readable format for GPT.
  5. Send to GPT

    • The query and context are sent to gpt-3.5-turbo via a system-guided prompt.
  6. Return Answer

    • GPT’s response is sent back to the frontend as the final answer.
const results = await vectorStore.similaritySearch(cleanedQuery, 3);

const context = results
  .map((res, i) => `Result ${i + 1}:\n${res.pageContent}\n`)
  .join("\n");

const SYSTEM_PROMPT = `You are a helpful assistant. Use the context below:\n\n${context}`;

const response = await openai.chat.completions.create({
  model: "gpt-3.5-turbo",
  messages: [
    { role: "system", content: SYSTEM_PROMPT },
    { role: "user", content: query },
  ],
});

const answer = response.choices[0]?.message?.content || "No response.";

A Short Glimpse of the project :

✅ Conclusion

Building this RAG-powered AI bot was more than just a technical experiment — it helped me dive deep into the core mechanics of Generative AI, vector embeddings, and retrieval-augmented architectures.

By combining tools like Next.js, OpenAI, and Qdrant, I was able to create a chatbot that doesn’t just "talk", but truly understands and responds based on real data — making it practical, scalable, and production-ready.

✨ Key Takeaways:

  • RAG bridges the gap between LLMs and external knowledge.

  • Chunking and embedding strategy matters as much as the model itself.

  • A well-structured backend and prompt control = better responses.

🔗 Feel free to check out the repository for codebase → https://github.com/Aarush18/Rag-Pdf-App
💬
Got questions or feedback? I’d love to hear from you!

0
Subscribe to my newsletter

Read articles from Aarush Gupta directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Aarush Gupta
Aarush Gupta