🔍 What is RAG?

Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of Large Language Models (LLMs) by integrating external data sources. Instead of relying solely on pre-trained knowledge, RAG systems retrieve relevant information from external databases or documents to provide more accurate and context-aware responses.

I have already discussed RAG and comparison with Fine Tuning in my earlier article : RAG Simplified: Smarter AI with Real-World Knowledge

and RAG architecture at How to Build a RAG-Powered Chatbot Using JavaScript and LangChain

This article will dive deeper into code with github link.

🧰 Prerequisites

Before diving in, ensure you have the following:

Node.js installed on your machine.
API Keys for:
- OpenAI
- Pinecone

A PDF document you'd like to use for querying. I am using “Sukanya Samriddhi Account Scheme” which can be downloaded from here.

📦 Setting Up the Environment

Install the necessary packages:

npm install @langchain/community @langchain/openai @langchain/pinecone @langchain/textsplitters pdf-parse dotenv

Create a .env file in your project root and add your API keys:

OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=your_pinecone_index

⬆️ Loading and ✂️ Splitting the PDF

First, load your PDF document and split it into manageable chunks

import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";

// ⬆️ load a document
const loader = new PDFLoader(
  "./data/docs/SukanyaSamriddhiAccountSchemeRule.pdf",
  {
    splitPages: false,
  }
);
const docs = await loader.load();
// console.log(docs);

// ✂️ split the document
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 150,
});
const chunks = await splitter.splitDocuments(docs);
console.log(`Split the document into ${chunks.length} sub-documents.`);
//Split the document into 14 sub-documents.

The console.log(docs) gave array of Document object having structure like below:

[
 Document {
pageContent: "The text of the pdf file",
metaData: {
   source: "./data/docs/SukanyaSamriddhiAccountSchemeRule.pdf", //relative path to the pdf file
    pdf: [{
        version: '1.10.100',
         info: {
            PDFFormatVersion: '1.5',
            IsAcroFormPresent: false,
            IsXFAPresent: false,
            Author: 'nsiindia@outlook.com',
            Creator: 'Microsoft® Word 2010',
            Producer: 'Microsoft® Word 2010',
            CreationDate: "D:20210924172322+05'30'",
            ModDate: "D:20210924172322+05'30'"        
        }
        }]
        },
    id: 
    }
]

Here I have used RecursiveCharacterTextSplitter() to keep larger units like paragraph within one sub document. You can try other types of splitters also.

In this example the pdf is split into 14 documents (chunks). Each chunk is similar to structure below:

This process ensures that the document is divided into coherent sections, preserving context for better retrieval.

🔢 Generating Embeddings

Convert the text chunks into vector embeddings using OpenAI's embedding model:

import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import dotenv from "dotenv";
dotenv.config();

// 🔑 create embeddings
const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
  apiKey: process.env.OPENAI_API_KEY,
  batchSize: 512, //max 2048
  dimensions: 1024,
});

// view the embedded docs
const embeddedDocs = await embeddings.embedDocuments(
  chunks.map((c) => c.pageContent)
);
console.log("🚀 ~ embeddedDocs:", embeddedDocs);

The output embeddedDocs contains vector embedding of each chunk as per the model chosen.

These embeddings represent the semantic meaning of each text chunk, facilitating similarity searches later on.

🗃️ Storing Embeddings in Pinecone

Store the generated embeddings in Pinecone, a vector database optimized for similarity searches:

import { PineconeStore } from "@langchain/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";

// initialize Pinecone
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index(process.env.PINECONE_INDEX);

// instantiate VectorStore
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
  pineconeIndex: index,
  // Maximum number of batch requests to allow at once. Each batch is 1000 vectors.
  maxConcurrency: 5,
});
const pineconeStore = await vectorStore.addDocuments(chunks);
console.log("🚀 ~ pineconeStore:", pineconeStore);

The result of pineconeStore that is stored in pinecone is array of ids.

[
  'fd91d9b7-f46f-4d0b-834a-7ace58355666',
  'a763de73-f6fe-44be-998c-d0204a435495',
  '684bc40c-7b8d-4dda-9e2e-c2fdf9e380fa',
  '4f4d5d67-eb0d-4e7e-b743-9c00633f8ecc',
  'e2400a02-8408-486f-a9cf-9cbb4031fe79',
  'dd6cb2cd-2542-4a8f-9a15-dbeaa6e24b0d',
  '21f0d811-1e38-412f-8820-c28ec928754d',
  'a8d7b243-6b2e-4abb-a0ae-8a8229848e71',
  'ff0ac745-1558-44fe-96e5-5e57e5a44456',
  '4de1334b-b5e7-41d6-822e-7f90bb3638d4',
  '560ca1b6-2624-4078-8486-ba2a38178eb9',
  '547ecb7d-8140-410f-bb0a-e07fa24855f3',
  '449f19fa-f146-4cdd-9901-fb3b760c9077',
  'e4806b11-4b1d-463e-9dc6-02e013766829'
]

This setup enables efficient retrieval of relevant document sections based on user queries.

🔍 Retrieving Relevant Chunks

When a user poses a question, retrieve the most relevant document chunks:

// optional filter
const filter = {}; //match to metadata
const retriever = vectorStore.asRetriever({
  filter,
  k: 2, // no.of results
});

const query = "What is the age limit for account opening?";
const relevantDocs = await retriever.invoke(query);
console.log("🚀 ~ relevantDocs:", relevantDocs);

Received 2 documents snippets where the relevant vector embeddings would be.

🧠 Generating Responses with OpenAI

Combine the retrieved documents with the user's query to generate a response using OpenAI's GPT model:

import { PromptTemplate } from "@langchain/core/prompts";
import {
  RunnableSequence,
  RunnablePassthrough,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatOpenAI } from "@langchain/openai";

// Retrieval chain
const customPromptTemplate = `Answer the quetions based only on the context provided.
If you don't know the answer, just say politely that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
`;
const model = new ChatOpenAI({ temperature: 0.2,model: "gpt-4o-mini" });
const prompt = PromptTemplate.fromTemplate(customPromptTemplate);
const chain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  model,
  new StringOutputParser(),
]);
const reply = await chain.invoke("Can I premature close the account ?");
console.log(reply);

final output result received is

You may be able to prematurely close the account if there are extreme compassionat e grounds, such as medical support for life-threatening diseases or the death of a guardian. Complete documentation must be provided to establish these grounds, and the accounts office must be satisfied. However, no premature closure can occur be fore certain conditions are met, which are not specified in the provided context.

RunnableSequence is a sequence of runnables, where the output of each is the input of the next.

🖼️ Visual Overview

Here's a simplified flow of the RAG system:

PDF Document: Your source of information.
Text Splitter: Divides the document into chunks.
Embeddings Generator: Converts text chunks into vector embeddings.
Pinecone Vector Store: Stores embeddings for efficient retrieval.
Retriever: Fetches relevant chunks based on user queries.
OpenAI LLM: Generates responses using the retrieved context.

🧑‍💻Complete Code

import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";
import { PromptTemplate } from "@langchain/core/prompts";
import {
  RunnableSequence,
  RunnablePassthrough,
} from "@langchain/core/runnables";
import dotenv from "dotenv";
import { formatDocumentsAsString } from "langchain/util/document";
import { StringOutputParser } from "@langchain/core/output_parsers";
dotenv.config();

// ⬆️ load a document
const loader = new PDFLoader(
  "./data/docs/SukanyaSamriddhiAccountSchemeRule.pdf",
  {
    splitPages: false,
  }
);

const docs = await loader.load();
// console.log(docs[0]?.metadata?.pdf);

// ✂️ split the document
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 150,
});
const chunks = await splitter.splitDocuments(docs);
// console.log("🚀 ~ chunks:", chunks[1]);
console.log(`Split the document into ${chunks.length} sub-documents.`);

// 🔑 create embeddings
const embeddings = new OpenAIEmbeddings({
  model: "text-embedding-3-small",
  apiKey: process.env.OPENAI_API_KEY,
  batchSize: 512, //max 2048
  dimensions: 1024,
});
/* const embeddedDocs = await embeddings.embedDocuments(
  chunks.map((c) => c.pageContent)
);
console.log("🚀 ~ embeddedDocs:", embeddedDocs); */

// initialize Pinecone
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index(process.env.PINECONE_INDEX);

// instantiate VectorStore
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
  pineconeIndex: index,
  // Maximum number of batch requests to allow at once. Each batch is 1000 vectors.
  maxConcurrency: 5,
});
/* const pineconeStore = await vectorStore.addDocuments(chunks);
console.log("🚀 ~ pineconeStore:", pineconeStore);
 */
// optional filter
const filter = {}; //match to metadata
const retriever = vectorStore.asRetriever({
  filter,
  k: 2, // no.of results
});

const query = "What is the age limit for account opening?";
const relevantDocs = await retriever.invoke(query);
console.log("🚀 ~ relevantDocs:", relevantDocs);

// Retrireval chain
const customPromptTemplate = `Answer the quetions based only on the context provided.
If you don't know the answer, just say politely that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
`;
const model = new ChatOpenAI({ temperature: 0.2, model: "gpt-4o-mini" });
const prompt = PromptTemplate.fromTemplate(customPromptTemplate);
const chain = RunnableSequence.from([
  {
    context: retriever.pipe(formatDocumentsAsString),
    question: new RunnablePassthrough(),
  },
  prompt,
  model,
  new StringOutputParser(),
]);
const reply = await chain.invoke("Can I premature close the account ?");
console.log(reply);

🔗Git Link

https://github.com/Devendra616/cohort-rag-pdf

🚀 Conclusion

By integrating LangChain.js, OpenAI, and Pinecone, we now have built a robust RAG system capable of providing accurate and context-aware responses based on PDF documents. This setup is versatile and can be extended to other document types or data sources as needed.

How to build a RAG with Langchain JS, OpenAI and Pinecone to read a pdf file.

Table of contents