Building a Retrieval-Based LLM with LangChain


In our previous article, we explored how LangChain can be used to build a simple LLM chain using a prompt template. This time, we’re diving deeper into a more powerful capability of LangChain — creating a Retrieval Chain, a pattern especially useful when dealing with large datasets that can’t be passed directly to the LLM.
Whether your data lives in SQL tables, documents, or on the open web, LangChain allows you to retrieve only the relevant information needed to answer a question — making LLMs smarter, faster, and more scalable.
Why Retrieval Chains?
Passing large amounts of raw data as input to an LLM is inefficient and often impossible due to context size limits. Retrieval chains solve this by introducing a smart middle step:
Instead of sending the entire dataset to the LLM, we embed it into numerical vectors and store it in a vector store.
When a user asks a question, a retriever searches the vector store for the most relevant chunks.
Only these relevant pieces, along with the user’s query, are sent to the LLM to generate a response.
Let’s walk through how this process works in practice using LangChain.
What Are Vector Stores and Embeddings?
Before we start building, let’s understand two key components:
Vector Embeddings: These are numerical representations of text that capture semantic meaning. For example, the words “dog” and “puppy” will have embeddings that are close in vector space since they are related in meaning.
Vector Store: A specialized database that stores these embeddings, making it easy to search and retrieve relevant chunks using similarity-based methods.
Step-by-Step: Building a Retrieval Chain with LangChain
We’ll build a retrieval chain using a real website (https://www.dasa.org
) as our data source. Here’s how:
1. Loading Website Data
We’ll start by fetching content from the website using WebBaseLoader
. But first, install the required library:
pip install beautifulsoup4
Now, extract the data:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.dasa.org")
docs = loader.load()
The docs
variable now holds the unstructured textual content from the site.
2. Splitting the Text
LLMs perform better with well-chunked input. We’ll use RecursiveCharacterTextSplitter
to break the content into manageable pieces.
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
Now, documents
contains fragmented chunks suitable for embedding.
3. Creating Vector Embeddings
Next, we convert these text chunks into vector embeddings using OpenAI’s embedding model:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
4. Storing Embeddings in a Vector Store
We’ll use FAISS, a fast and lightweight vector store developed by Facebook. Install it using:
pip install faiss-cpu
Then, store the embeddings:
from langchain_community.vectorstores import FAISS
vector = FAISS.from_documents(documents, embeddings)
Our vector store is now ready to power a retrieval-based question-answering system.
5. Creating the Retrieval Chain
Now comes the core logic. We need to build a prompt template that includes both the retrieved context and the user’s question.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(api_key="<Your OpenAI API Key>")
prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}
""")
Here, {context}
will be filled with the retrieved data and {input}
with the user’s question.
6. Combining It All
We’ll use LangChain’s create_stuff_documents_chain
to tie the prompt and LLM together:
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain = create_stuff_documents_chain(llm, prompt)
Next, create a retriever from the vector store:
retriever = vector.as_retriever()
And finally, build the full retrieval chain:
from langchain.chains import create_retrieval_chain
retrieval_chain = create_retrieval_chain(retriever, document_chain)
7. Ask Your Question!
With everything wired up, you can now ask questions based on your custom dataset:
response = retrieval_chain.invoke({
"input": "What are the talent products delivered by DASA?"
})
print(response["answer"])
The LLM, equipped with only the most relevant snippets from the website, provides an accurate and concise answer:
"The talent products delivered by DASA focus on enhancing individual and team capabilities in organizations, preparing them for high-performance environments that boost enterprise agility, customer centricity, and innovation."
The Power of Retrieval Chains
LangChain’s Retrieval Chains offer an efficient way to use large and complex datasets as LLM context — without exceeding token limits or sacrificing accuracy. Here's how it works, in summary:
Load and split data
Convert to vector embeddings
Store in a vector database
Retrieve relevant chunks per query
Send only those chunks + query to the LLM
This design makes your applications smarter, faster, and more scalable.
What’s Next?
In upcoming posts, we’ll explore even more advanced LangChain patterns, including:
Conversational Retrieval Chains — allowing the LLM to remember previous messages in a chat-like interface.
Agents — where the LLM autonomously decides what actions to take, enabling more dynamic and multi-step workflows.
Stay tuned! 👍
Subscribe to my newsletter
Read articles from Anuj Kumar Upadhyay directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Anuj Kumar Upadhyay
Anuj Kumar Upadhyay
I am a developer from India. I am passionate to contribute to the tech community through my writing. Currently i am in my Graduation in Computer Application.