Project Title: Retrieval-Augmented Generation (RAG) Powered Q&A Web Application

🛠️ Tech Stack

Frontend: Next.js, React, TypeScript, Tailwind CSS, shadcn/ui
Backend: Node.js, Express
Queue System: BullMQ, Vallkey (Redis)
Vector Database: Qdrant
Authentication: Clerk
LLM: OpenAI (for Q&A generation)
Cache Management: Redis (via Vallkey)

Overview: The web application utilizes a Retrieval-Augmented Generation (RAG) architecture to build an intelligent question-answering (Q&A) system. The main use case is to allow users to ask questions about a custom knowledge base, and the system leverages Qdrant for vector search and OpenAI for sophisticated natural language responses.

Key Features:

Q&A System: The app fetches answers based on vector similarity search from Qdrant and then uses OpenAI for final response generation.
Queue System (BullMQ): BullMQ is used to handle background tasks such as vector embedding, data indexing, OpenAI completions, and cache synchronization. This improves application scalability and responsiveness by offloading time-consuming tasks from the main API request/response cycle.
Redis Caching: Vallkey manages Redis connections efficiently for caching frequently requested data and optimizing response time.
Authentication: User authentication is handled by Clerk, ensuring a seamless login experience.

Architecture Overview:

Frontend (Next.js)
- Next.js provides server-side rendering (SSR) for improved SEO and fast initial load.
- React handles the interactive UI components.
- Tailwind CSS, ensures responsive and clean design.
Backend (Node.js)
- The Node.js backend handles the business logic, data retrieval, and background job management.
- Express is used for routing and middleware handling.
- BullMQ is used for queue management, running background jobs like data processing, embeddings, and OpenAI completion.
Queue System (BullMQ)
- BullMQ is used to manage and process background tasks, such as:
  - Vector embedding and indexing with Qdrant.
  - OpenAI API calls for generating intelligent responses.
  - Caching data in Redis to optimize API performance.
- Workers subscribe to BullMQ queues, handle tasks asynchronously, and retry in case of failure.
Redis and Vallkey
- Vallkey connects seamlessly to Redis, which is used for caching frequently requested data, improving response times for user queries.
- Redis ensures that repeated tasks like fetching embeddings or user-related data are efficiently served.
Qdrant
- Qdrant is the vector search engine used for storing and retrieving vectorized representations of data, allowing the application to match user queries with relevant information from the knowledge base.

📚 Features

✅ User Authentication via Clerk

Secure authentication and authorization using Clerk to handle user sessions and sign-ins seamlessly.

🔎 Document-based Question Answering

Enables users to ask questions and retrieve answers from uploaded PDF by utilizing RAG (Retrieval-Augmented Generation) technology.

🧠 RAG Pipeline Using LangChain + Qdrant

Integrates LangChain for orchestrating the pipeline and QdrantDB for efficient vector-based document search, allowing fast retrieval of semantically similar text chunks.

💬 Chat-like Interface with Context Retention

Offers a conversational Q&A interface that remembers the context of previous interactions, enabling a smooth user experience.

🌐 Responsive UI Built with Tailwind

A sleek user interface using Tailwind CSS, ensuring great usability across devices.

🔐 Background Task Processing with BullMQ

Handles long-running background tasks like document chunking and embedding using BullMQ, enabling efficient processing without blocking the main application.

⚡ Optimized Performance with Redis and Vallkey

Utilizes Redis and Vallkey for managing in-memory data and optimizing the overall performance, particularly for tasks involving queues and caching.

System Design & Flow

Document Processing Workflow:

Reading PDF File:
- The first step involves extracting text from the PDF document. This can be done using libraries like pdf-lib, pdf2json, or pdf.js.
- The content of the PDF is read and stored in memory as a raw string.

    import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";

    const loader = new PDFLoader(job.data.path);
    const docs = await loader.load();

Chunking the PDF:
- The document is then split into smaller chunks for further processing.
- A chunk is typically a paragraph or a set number of characters, depending on your design choice.
- This chunking ensures that:
  - The content remains manageable for embedding.
  - Each chunk corresponds to a meaningful section of information for later retrieval.

import {CharacterTextSplitter} from  "langchain/text_splitter";

// chuncking and splitting     
     const splitter = new CharacterTextSplitter({
        chunkSize: 300,
        chunkOverlap: 1,
      }); 

     const splitDocs = await splitter.splitDocuments(docs);

OpenAI Embedding:
- Once a chunk is created, the OpenAI embedding model is called to generate the vector representation for that chunk.
- The OpenAI model maps the textual content into a vector (array of floating-point numbers) that captures the semantic meaning of the text.
- Example API call to OpenAI:
```
  import { OpenAIEmbeddings } from '@langchain/openai';

  const embedding = new OpenAIEmbeddings({
          model:'text-embedding-3-small',
          apiKey:'api_key'
       })
```

Storing in QdrantDB:

After embedding the chunk, the resulting vector and chunk metadata (like the original text) are stored in QdrantDB for fast retrieval.
Qdrant allows storing, indexing, and querying vectors for similarity-based search.
The vector and associated metadata (text of the chunk) are inserted into the database.

Example insertion into Qdrant:

  import { QdrantVectorStore } from '@langchain/qdrant';

          const vectorStore = await QdrantVectorStore.fromExistingCollection(
           embedding,
           {
              url:'redis_connection_URL',
              collectionName:'pdf-docs'
           }

           );

          await vectorStore.addDocuments(docs);

Why Use a Separate Node.js Backend:

While Next.js offers the ability to handle backend API routes, we opted to use a separate Node.js backend for the following reasons:

Handling Long-Running Background Jobs (BullMQ): Next.js API routes are serverless, meaning they terminate after processing. Since tasks like vector embedding, OpenAI API calls, and Redis synchronization can be long-running, they are better handled in a persistent Node.js backend. This allows for more control over job processing and scaling.
Separation of Concerns: Using a separate Node.js backend allows for a clear separation between the frontend and backend logic. This is particularly helpful for scalability, maintainability, and future expansion (e.g., migrating to microservices).
Queue Management: With BullMQ, background workers are more effectively managed in a persistent Node.js server. This gives us the ability to run job workers independently and handle task retries, rate limiting, and concurrency, which would be challenging to implement in Next.js alone.
Optimized Performance: By offloading heavy computations and API calls to background workers managed by BullMQ in Node.js, the system remains responsive and scalable even with a growing user base and data volume.

Building a RAG-Powered Q&A Web App: A Journey to Smarter Search

🛠️ Tech Stack

Architecture Overview:

📚 Features

System Design & Flow

Why Use a Separate Node.js Backend:

Subscribe to my newsletter

Akash Maurya

Akash Maurya

Building a RAG-Powered Q&A Web App: A Journey to Smarter Search

🛠️ Tech Stack

Architecture Overview:

📚 Features

SignUp

System Design & Flow

Why Use a Separate Node.js Backend:

Subscribe to my newsletter

Akash Maurya

Akash Maurya