Building a RAG-Powered Q&A Web App: A Journey to Smarter Search

Akash MauryaAkash Maurya
5 min read

Project Title: Retrieval-Augmented Generation (RAG) Powered Q&A Web Application

๐Ÿ› ๏ธ Tech Stack

  • Frontend: Next.js, React, TypeScript, Tailwind CSS, shadcn/ui

  • Backend: Node.js, Express

  • Queue System: BullMQ, Vallkey (Redis)

  • Vector Database: Qdrant

  • Authentication: Clerk

  • LLM: OpenAI (for Q&A generation)

  • Cache Management: Redis (via Vallkey)

Overview: The web application utilizes a Retrieval-Augmented Generation (RAG) architecture to build an intelligent question-answering (Q&A) system. The main use case is to allow users to ask questions about a custom knowledge base, and the system leverages Qdrant for vector search and OpenAI for sophisticated natural language responses.

Key Features:

  1. Q&A System: The app fetches answers based on vector similarity search from Qdrant and then uses OpenAI for final response generation.

  2. Queue System (BullMQ): BullMQ is used to handle background tasks such as vector embedding, data indexing, OpenAI completions, and cache synchronization. This improves application scalability and responsiveness by offloading time-consuming tasks from the main API request/response cycle.

  3. Redis Caching: Vallkey manages Redis connections efficiently for caching frequently requested data and optimizing response time.

  4. Authentication: User authentication is handled by Clerk, ensuring a seamless login experience.

Architecture Overview:

  1. Frontend (Next.js)

    • Next.js provides server-side rendering (SSR) for improved SEO and fast initial load.

    • React handles the interactive UI components.

    • Tailwind CSS, ensures responsive and clean design.

  2. Backend (Node.js)

    • The Node.js backend handles the business logic, data retrieval, and background job management.

    • Express is used for routing and middleware handling.

    • BullMQ is used for queue management, running background jobs like data processing, embeddings, and OpenAI completion.

  3. Queue System (BullMQ)

    • BullMQ is used to manage and process background tasks, such as:

      • Vector embedding and indexing with Qdrant.

      • OpenAI API calls for generating intelligent responses.

      • Caching data in Redis to optimize API performance.

    • Workers subscribe to BullMQ queues, handle tasks asynchronously, and retry in case of failure.

  4. Redis and Vallkey

    • Vallkey connects seamlessly to Redis, which is used for caching frequently requested data, improving response times for user queries.

    • Redis ensures that repeated tasks like fetching embeddings or user-related data are efficiently served.

  5. Qdrant

    • Qdrant is the vector search engine used for storing and retrieving vectorized representations of data, allowing the application to match user queries with relevant information from the knowledge base.

๐Ÿ“š Features

โœ… User Authentication via Clerk

  • Secure authentication and authorization using Clerk to handle user sessions and sign-ins seamlessly.

SignUp

๐Ÿ”Ž Document-based Question Answering

  • Enables users to ask questions and retrieve answers from uploaded PDF by utilizing RAG (Retrieval-Augmented Generation) technology.

๐Ÿง  RAG Pipeline Using LangChain + Qdrant

  • Integrates LangChain for orchestrating the pipeline and QdrantDB for efficient vector-based document search, allowing fast retrieval of semantically similar text chunks.

๐Ÿ’ฌ Chat-like Interface with Context Retention

  • Offers a conversational Q&A interface that remembers the context of previous interactions, enabling a smooth user experience.

๐ŸŒ Responsive UI Built with Tailwind

  • A sleek user interface using Tailwind CSS, ensuring great usability across devices.

๐Ÿ” Background Task Processing with BullMQ

  • Handles long-running background tasks like document chunking and embedding using BullMQ, enabling efficient processing without blocking the main application.

โšก Optimized Performance with Redis and Vallkey

  • Utilizes Redis and Vallkey for managing in-memory data and optimizing the overall performance, particularly for tasks involving queues and caching.

System Design & Flow

Document Processing Workflow:

  1. Reading PDF File:

    • The first step involves extracting text from the PDF document. This can be done using libraries like pdf-lib, pdf2json, or pdf.js.

    • The content of the PDF is read and stored in memory as a raw string.

    import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

    const loader = new PDFLoader(job.data.path);
    const docs = await loader.load();
  1. Chunking the PDF:

    • The document is then split into smaller chunks for further processing.

    • A chunk is typically a paragraph or a set number of characters, depending on your design choice.

    • This chunking ensures that:

      • The content remains manageable for embedding.

      • Each chunk corresponds to a meaningful section of information for later retrieval.

import {CharacterTextSplitter} from  "langchain/text_splitter";

// chuncking and splitting     
     const splitter = new CharacterTextSplitter({
        chunkSize: 300,
        chunkOverlap: 1,
      }); 

     const splitDocs = await splitter.splitDocuments(docs);
  1. OpenAI Embedding:

    • Once a chunk is created, the OpenAI embedding model is called to generate the vector representation for that chunk.

    • The OpenAI model maps the textual content into a vector (array of floating-point numbers) that captures the semantic meaning of the text.

    • Example API call to OpenAI:

        import { OpenAIEmbeddings } from '@langchain/openai';
      
        const embedding = new OpenAIEmbeddings({
                model:'text-embedding-3-small',
                apiKey:'api_key'
             })
      
  2. Storing in QdrantDB:

    • After embedding the chunk, the resulting vector and chunk metadata (like the original text) are stored in QdrantDB for fast retrieval.

    • Qdrant allows storing, indexing, and querying vectors for similarity-based search.

    • The vector and associated metadata (text of the chunk) are inserted into the database.

    • Example insertion into Qdrant:

        import { QdrantVectorStore } from '@langchain/qdrant';
      
                const vectorStore = await QdrantVectorStore.fromExistingCollection(
                 embedding,
                 {
                    url:'redis_connection_URL',
                    collectionName:'pdf-docs'
                 }
      
                 );
      
                await vectorStore.addDocuments(docs);
      

Why Use a Separate Node.js Backend:

While Next.js offers the ability to handle backend API routes, we opted to use a separate Node.js backend for the following reasons:

  1. Handling Long-Running Background Jobs (BullMQ): Next.js API routes are serverless, meaning they terminate after processing. Since tasks like vector embedding, OpenAI API calls, and Redis synchronization can be long-running, they are better handled in a persistent Node.js backend. This allows for more control over job processing and scaling.

  2. Separation of Concerns: Using a separate Node.js backend allows for a clear separation between the frontend and backend logic. This is particularly helpful for scalability, maintainability, and future expansion (e.g., migrating to microservices).

  3. Queue Management: With BullMQ, background workers are more effectively managed in a persistent Node.js server. This gives us the ability to run job workers independently and handle task retries, rate limiting, and concurrency, which would be challenging to implement in Next.js alone.

  4. Optimized Performance: By offloading heavy computations and API calls to background workers managed by BullMQ in Node.js, the system remains responsive and scalable even with a growing user base and data volume.

0
Subscribe to my newsletter

Read articles from Akash Maurya directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Akash Maurya
Akash Maurya

I am a Software Associate Engineer at XRG Consulting Pvt. Ltd., where I contribute to backend development, user authentication, employee onboarding, and database optimization. I have a deep understanding of TypeScript, Prisma, and OAuth/OIDC authentication and work extensively on server-side performance improvements. As a MERN Stack Developer, I specialize in building scalable, high-performance applications and optimizing infrastructure for seamless deployment. With hands-on experience in MongoDB, Express.js, React, and Node.js, I design robust backends, create efficient frontend architectures, and streamline deployment workflows using Docker and CI/CD pipelines. I am passionate about writing clean, maintainable code and implementing DevOps best practices to enhance system reliability and scalability. Always eager to learn, I thrive in fast-paced environments where I can leverage my problem-solving skills to build cutting-edge solutions. ๐Ÿ”น Key Skills: MERN Stack | Node.js | TypeScript | MongoDB | Prisma | Docker | DevOps | OAuth & OIDC | Cloud Infrastructure | CI/CD | API Development | Scalable Systems