How to build a RAG with Langchain JS, OpenAI and Pinecone to read a pdf file.

Table of contents
- ๐ What is RAG?
- ๐งฐ Prerequisites
- ๐ฆ Setting Up the Environment
- โฌ๏ธ Loading and โ๏ธ Splitting the PDF
- ๐ข Generating Embeddings
- ๐๏ธ Storing Embeddings in Pinecone
- ๐ Retrieving Relevant Chunks
- ๐ง Generating Responses with OpenAI
- ๐ผ๏ธ Visual Overview
- ๐งโ๐ปComplete Code
- ๐Git Link
- ๐ Conclusion

๐ What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of Large Language Models (LLMs) by integrating external data sources. Instead of relying solely on pre-trained knowledge, RAG systems retrieve relevant information from external databases or documents to provide more accurate and context-aware responses.
I have already discussed RAG and comparison with Fine Tuning in my earlier article : RAG Simplified: Smarter AI with Real-World Knowledge
and RAG architecture at How to Build a RAG-Powered Chatbot Using JavaScript and LangChain
This article will dive deeper into code with github link.
๐งฐ Prerequisites
Before diving in, ensure you have the following:
Node.js installed on your machine.
API Keys for:
Pinecone
A PDF document you'd like to use for querying.โ I am using โSukanya Samriddhi Account Schemeโ which can be downloaded from here.
๐ฆ Setting Up the Environment
Install the necessary packages:
npm install @langchain/community @langchain/openai @langchain/pinecone @langchain/textsplitters pdf-parse dotenv
Create a .env
file in your project root and add your API keys:
OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
PINECONE_INDEX=your_pinecone_index
โฌ๏ธ Loading and โ๏ธ Splitting the PDF
First, load your PDF document and split it into manageable chunks
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
// โฌ๏ธ load a document
const loader = new PDFLoader(
"./data/docs/SukanyaSamriddhiAccountSchemeRule.pdf",
{
splitPages: false,
}
);
const docs = await loader.load();
// console.log(docs);
// โ๏ธ split the document
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 150,
});
const chunks = await splitter.splitDocuments(docs);
console.log(`Split the document into ${chunks.length} sub-documents.`);
//Split the document into 14 sub-documents.
The console.log(docs)
gave array of Document object having structure like below:
[
Document {
pageContent: "The text of the pdf file",
metaData: {
source: "./data/docs/SukanyaSamriddhiAccountSchemeRule.pdf", //relative path to the pdf file
pdf: [{
version: '1.10.100',
info: {
PDFFormatVersion: '1.5',
IsAcroFormPresent: false,
IsXFAPresent: false,
Author: 'nsiindia@outlook.com',
Creator: 'Microsoftยฎ Word 2010',
Producer: 'Microsoftยฎ Word 2010',
CreationDate: "D:20210924172322+05'30'",
ModDate: "D:20210924172322+05'30'"
}
}]
},
id:
}
]
Here I have used RecursiveCharacterTextSplitter() to keep larger units like paragraph within one sub document. You can try other types of splitters also.
In this example the pdf is split into 14 documents (chunks). Each chunk is similar to structure below:
This process ensures that the document is divided into coherent sections, preserving context for better retrieval.
๐ข Generating Embeddings
Convert the text chunks into vector embeddings using OpenAI's embedding model:
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import dotenv from "dotenv";
dotenv.config();
// ๐ create embeddings
const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-small",
apiKey: process.env.OPENAI_API_KEY,
batchSize: 512, //max 2048
dimensions: 1024,
});
// view the embedded docs
const embeddedDocs = await embeddings.embedDocuments(
chunks.map((c) => c.pageContent)
);
console.log("๐ ~ embeddedDocs:", embeddedDocs);
The output embeddedDocs contains vector embedding of each chunk as per the model chosen.
These embeddings represent the semantic meaning of each text chunk, facilitating similarity searches later on.โ
๐๏ธ Storing Embeddings in Pinecone
Store the generated embeddings in Pinecone, a vector database optimized for similarity searches:
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";
// initialize Pinecone
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index(process.env.PINECONE_INDEX);
// instantiate VectorStore
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
pineconeIndex: index,
// Maximum number of batch requests to allow at once. Each batch is 1000 vectors.
maxConcurrency: 5,
});
const pineconeStore = await vectorStore.addDocuments(chunks);
console.log("๐ ~ pineconeStore:", pineconeStore);
The result of pineconeStore that is stored in pinecone is array of ids.
[
'fd91d9b7-f46f-4d0b-834a-7ace58355666',
'a763de73-f6fe-44be-998c-d0204a435495',
'684bc40c-7b8d-4dda-9e2e-c2fdf9e380fa',
'4f4d5d67-eb0d-4e7e-b743-9c00633f8ecc',
'e2400a02-8408-486f-a9cf-9cbb4031fe79',
'dd6cb2cd-2542-4a8f-9a15-dbeaa6e24b0d',
'21f0d811-1e38-412f-8820-c28ec928754d',
'a8d7b243-6b2e-4abb-a0ae-8a8229848e71',
'ff0ac745-1558-44fe-96e5-5e57e5a44456',
'4de1334b-b5e7-41d6-822e-7f90bb3638d4',
'560ca1b6-2624-4078-8486-ba2a38178eb9',
'547ecb7d-8140-410f-bb0a-e07fa24855f3',
'449f19fa-f146-4cdd-9901-fb3b760c9077',
'e4806b11-4b1d-463e-9dc6-02e013766829'
]
This setup enables efficient retrieval of relevant document sections based on user queries.โ
๐ Retrieving Relevant Chunks
When a user poses a question, retrieve the most relevant document chunks:
// optional filter
const filter = {}; //match to metadata
const retriever = vectorStore.asRetriever({
filter,
k: 2, // no.of results
});
const query = "What is the age limit for account opening?";
const relevantDocs = await retriever.invoke(query);
console.log("๐ ~ relevantDocs:", relevantDocs);
Received 2 documents snippets where the relevant vector embeddings would be.
๐ง Generating Responses with OpenAI
Combine the retrieved documents with the user's query to generate a response using OpenAI's GPT model:
import { PromptTemplate } from "@langchain/core/prompts";
import {
RunnableSequence,
RunnablePassthrough,
} from "@langchain/core/runnables";
import { formatDocumentsAsString } from "langchain/util/document";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatOpenAI } from "@langchain/openai";
// Retrieval chain
const customPromptTemplate = `Answer the quetions based only on the context provided.
If you don't know the answer, just say politely that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
`;
const model = new ChatOpenAI({ temperature: 0.2,model: "gpt-4o-mini" });
const prompt = PromptTemplate.fromTemplate(customPromptTemplate);
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
model,
new StringOutputParser(),
]);
const reply = await chain.invoke("Can I premature close the account ?");
console.log(reply);
final output result received is
You may be able to prematurely close the account if there are extreme compassionat e grounds, such as medical support for life-threatening diseases or the death of a guardian. Complete documentation must be provided to establish these grounds, and the accounts office must be satisfied. However, no premature closure can occur be fore certain conditions are met, which are not specified in the provided context.
RunnableSequence is a sequence of runnables, where the output of each is the input of the next.
๐ผ๏ธ Visual Overview
Here's a simplified flow of the RAG system:
PDF Document: Your source of information.
Text Splitter: Divides the document into chunks.
Embeddings Generator: Converts text chunks into vector embeddings.
Pinecone Vector Store: Stores embeddings for efficient retrieval.
Retriever: Fetches relevant chunks based on user queries.
OpenAI LLM: Generates responses using the retrieved context.
๐งโ๐ปComplete Code
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { PineconeStore } from "@langchain/pinecone";
import { Pinecone } from "@pinecone-database/pinecone";
import { PromptTemplate } from "@langchain/core/prompts";
import {
RunnableSequence,
RunnablePassthrough,
} from "@langchain/core/runnables";
import dotenv from "dotenv";
import { formatDocumentsAsString } from "langchain/util/document";
import { StringOutputParser } from "@langchain/core/output_parsers";
dotenv.config();
// โฌ๏ธ load a document
const loader = new PDFLoader(
"./data/docs/SukanyaSamriddhiAccountSchemeRule.pdf",
{
splitPages: false,
}
);
const docs = await loader.load();
// console.log(docs[0]?.metadata?.pdf);
// โ๏ธ split the document
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 150,
});
const chunks = await splitter.splitDocuments(docs);
// console.log("๐ ~ chunks:", chunks[1]);
console.log(`Split the document into ${chunks.length} sub-documents.`);
// ๐ create embeddings
const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-small",
apiKey: process.env.OPENAI_API_KEY,
batchSize: 512, //max 2048
dimensions: 1024,
});
/* const embeddedDocs = await embeddings.embedDocuments(
chunks.map((c) => c.pageContent)
);
console.log("๐ ~ embeddedDocs:", embeddedDocs); */
// initialize Pinecone
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index(process.env.PINECONE_INDEX);
// instantiate VectorStore
const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
pineconeIndex: index,
// Maximum number of batch requests to allow at once. Each batch is 1000 vectors.
maxConcurrency: 5,
});
/* const pineconeStore = await vectorStore.addDocuments(chunks);
console.log("๐ ~ pineconeStore:", pineconeStore);
*/
// optional filter
const filter = {}; //match to metadata
const retriever = vectorStore.asRetriever({
filter,
k: 2, // no.of results
});
const query = "What is the age limit for account opening?";
const relevantDocs = await retriever.invoke(query);
console.log("๐ ~ relevantDocs:", relevantDocs);
// Retrireval chain
const customPromptTemplate = `Answer the quetions based only on the context provided.
If you don't know the answer, just say politely that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
{context}
Question: {question}
`;
const model = new ChatOpenAI({ temperature: 0.2, model: "gpt-4o-mini" });
const prompt = PromptTemplate.fromTemplate(customPromptTemplate);
const chain = RunnableSequence.from([
{
context: retriever.pipe(formatDocumentsAsString),
question: new RunnablePassthrough(),
},
prompt,
model,
new StringOutputParser(),
]);
const reply = await chain.invoke("Can I premature close the account ?");
console.log(reply);
๐Git Link
https://github.com/Devendra616/cohort-rag-pdf
๐ Conclusion
By integrating LangChain.js, OpenAI, and Pinecone, we now have built a robust RAG system capable of providing accurate and context-aware responses based on PDF documents. This setup is versatile and can be extended to other document types or data sources as needed.โ
Subscribe to my newsletter
Read articles from Buddy Coder directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
