How I Built My First RAG App Using LLMs to Simplify Developer Docs

Ayush DixitAyush Dixit
5 min read

We all face the same issue when reading technical documentation — we often come across jargon. One day, I had an idea: since we use LLMs daily to understand concepts and solve problems, why not use them to simplify technical docs? So, I came up with this approach:

How Our Approach Works

LLMs may or may not be up to date with the current world, which means they can be outdated when it comes to certain technical documentation. So, I came up with an idea: why not web scrape the targeted documentation to get the most recent data and feed it into the LLM for further simplification? The AI can then break down the jargon and make the content easier to understand — even providing helpful code snippets.

Challenges With this Approach

There’s a chance that the documentation contains a large amount of data, which means the scraped content will also be extensive. Feeding this large dataset into an LLM can be inefficient and may slow down both the LLM and backend response time. Additionally, after processing, we need to store this data somewhere so that the LLM has some form of memory — enabling it to answer follow-up questions effectively.

My Solution to the Problem

To tackle this problem, I scraped the full content and applied a limit to the amount of data passed to the LLM. This raises a question: did I truly solve the issue? The answer is somewhere between yes and no. For real-time processing, the LLM may not have access to the entire document due to the imposed limit.

To address this, I created vector embeddings of the entire scraped data (not just the limited portion sent to the LLM) and stored them in Qdrant DB — an open-source vector database. This allows me to handle follow-up questions effectively. When a user asks a follow-up, I generate a vector embedding of their question, perform a similarity search in the vector DB, and fetch the most relevant content.

This approach helps me efficiently manage both challenges: preventing the LLM from being overloaded and ensuring the data is already prepared and accessible for accurate responses.

Here’s the code I used in my app:

export const getDocsSimplified = asyncHandler(
  async (req: Request, res: Response) => {
    const { chatId } = req.params;
    const { prompt } = req.body;

    if (!prompt) {
      res.status(400).json({ error: "Prompt is required", success: false });
      return;
    }

    const httpsUrlRegex = /^https:\/\/[^\s/$.?#].[^\s]*$/i;

    if (!httpsUrlRegex.test(prompt)) {
      res.status(400).json({ error: "Prompt must be a valid HTTPS URL Only" });
      return;
    }

    try {
      // This line scrape data from web
      const scrapedData = await scrapeDocs(prompt);

      const maxInputLength = 16000;

      const trimmedInput =
        scrapedData.length > maxInputLength
          ? scrapedData.slice(0, maxInputLength)
          : scrapedData;

      const stream = await googleAiClient.models.generateContentStream({
        model: "gemini-2.0-flash",
        contents: trimmedInput ? trimmedInput : prompt,
        config: {
          temperature: 0.2,
          systemInstruction: `You are a technical assistant specializing in simplifying technical documentation for beginners.
           Your task:
             - Carefully read and understand the scraped content provided.
             - Summarize only the important and relevant parts in clear, beginner-friendly language.
             - Use simple, real-world analogies where helpful.
             - Provide simple code examples in relevant programming languages to illustrate concepts.
             - Format your response using bullet points and code blocks when appropriate.
             - Avoid technical jargon and overly complex explanations.
             - If the scraped content does not contain enough information, answer based on your own knowledge, but state when you are doing so.
             - Do not provide unrelated information or speculation.
             - Keep the response concise and focused on teaching.

              Respond clearly and helpfully.`,
        },
      });

      let string = "";

      res.setHeader("Content-Type", "text/plain");
      res.setHeader("Transfer-Encoding", "chunked");

      try {
        for await (const chunk of stream) {
          if (chunk.text) {
            res.write(chunk.text);
            string += chunk.text;
          }
          if (res.flush) res.flush();
        }
        res.end();
      } catch (streamError) {
        if (!res.writableEnded) {
          res.end();
        }
      }
      const chat = await Chat.findById(chatId);

      if (!chat) {
        return new CustomError("Chat not found!", 400);
      }

      const userMessage = new Message({
        role: "user",
        content: prompt,
        chatId: chat._id,
      });

      const systemMessage = new Message({
        role: "assistant",
        content: string,
        chatId: chat._id,
      });
      await userMessage.save();
      await systemMessage.save();

      chat.messages.push(...[userMessage._id, systemMessage._id]);

      await chat.save();

      const textSplitter = new RecursiveCharacterTextSplitter({
        chunkSize: 500,
        chunkOverlap: 50,
      });
      const texts = await textSplitter.splitText(scrapedData);
      const payloads = texts.map(() => ({ chatId: chat._id.toString() }));

      await QdrantVectorStore.fromTexts(texts, payloads, embeddings, {
        client: qdrantClient,
        collectionName: "chats_docs_chunks",
      });

      const replyEmbedding = await embeddings.embedQuery(string);
      await qdrantClient.upsert("chats_docs_chunks", {
        points: [
          {
            id: uuidv4(),
            vector: replyEmbedding,
            payload: {
              chatId: chatId.toString(),
              type: "assistant_response",
              content: string,
              timestamp: new Date().toISOString(),
            },
          },
        ],
      });
    } catch (error: unknown) {
      res
        .status(500)
        .json({ error: "Failed to generate response", success: false });
    }
  }
);

My Learnings While Building This Project

I learned a lot while working on this project. I explored concepts like vector databases, embeddings, and various options for LLMs. Initially, I was working with Ollama, but later switched to Google Gemini. Along the way, I also gained insights into agentic workflows and some core concepts of LangChain.

Conclusion

This project started with a simple idea: using LLMs to make technical documentation more understandable. Along the way, I faced real-world challenges like outdated model knowledge, data size limitations, and the need for follow-up context. By combining web scraping, vector embeddings, and a vector database like Qdrant, I was able to create a system that not only simplifies technical content but also responds intelligently to user queries.

The journey taught me a lot — from working with different LLMs and embeddings to exploring agentic workflows and LangChain concepts. This project proved how powerful and flexible AI can be when paired with the right tools and strategies. And more importantly, it highlighted the value of persistence and creative problem-solving in real-world development.

0
Subscribe to my newsletter

Read articles from Ayush Dixit directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ayush Dixit
Ayush Dixit

I'm a passionate MERN stack developer who loves building websites and apps. I enjoy solving problems and bringing ideas to life through code. I believe in learning something new every day to improve my skills and keep up with the latest technologies. I’m always excited to work with other developers, share knowledge, and contribute to open-source projects. Let’s connect and create something great together!