Create Retrieval-Based LLM Using LangChain

In our previous article, we explored how LangChain can be used to build a simple LLM chain using a prompt template. This time, we’re diving deeper into a more powerful capability of LangChain — creating a Retrieval Chain, a pattern especially useful when dealing with large datasets that can’t be passed directly to the LLM.

Whether your data lives in SQL tables, documents, or on the open web, LangChain allows you to retrieve only the relevant information needed to answer a question — making LLMs smarter, faster, and more scalable.

Why Retrieval Chains?

Passing large amounts of raw data as input to an LLM is inefficient and often impossible due to context size limits. Retrieval chains solve this by introducing a smart middle step:

Instead of sending the entire dataset to the LLM, we embed it into numerical vectors and store it in a vector store.
When a user asks a question, a retriever searches the vector store for the most relevant chunks.
Only these relevant pieces, along with the user’s query, are sent to the LLM to generate a response.

Let’s walk through how this process works in practice using LangChain.

What Are Vector Stores and Embeddings?

Before we start building, let’s understand two key components:

Vector Embeddings: These are numerical representations of text that capture semantic meaning. For example, the words “dog” and “puppy” will have embeddings that are close in vector space since they are related in meaning.
Vector Store: A specialized database that stores these embeddings, making it easy to search and retrieve relevant chunks using similarity-based methods.

Step-by-Step: Building a Retrieval Chain with LangChain

We’ll build a retrieval chain using a real website (https://www.dasa.org) as our data source. Here’s how:

1. Loading Website Data

We’ll start by fetching content from the website using WebBaseLoader. But first, install the required library:

pip install beautifulsoup4

Now, extract the data:

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://www.dasa.org")
docs = loader.load()

The docs variable now holds the unstructured textual content from the site.

2. Splitting the Text

LLMs perform better with well-chunked input. We’ll use RecursiveCharacterTextSplitter to break the content into manageable pieces.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

Now, documents contains fragmented chunks suitable for embedding.

3. Creating Vector Embeddings

Next, we convert these text chunks into vector embeddings using OpenAI’s embedding model:

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

4. Storing Embeddings in a Vector Store

We’ll use FAISS, a fast and lightweight vector store developed by Facebook. Install it using:

pip install faiss-cpu

Then, store the embeddings:

from langchain_community.vectorstores import FAISS

vector = FAISS.from_documents(documents, embeddings)

Our vector store is now ready to power a retrieval-based question-answering system.

5. Creating the Retrieval Chain

Now comes the core logic. We need to build a prompt template that includes both the retrieved context and the user’s question.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(api_key="<Your OpenAI API Key>")

prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}
""")

Here, {context} will be filled with the retrieved data and {input} with the user’s question.

6. Combining It All

We’ll use LangChain’s create_stuff_documents_chain to tie the prompt and LLM together:

from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)

Next, create a retriever from the vector store:

retriever = vector.as_retriever()

And finally, build the full retrieval chain:

from langchain.chains import create_retrieval_chain

retrieval_chain = create_retrieval_chain(retriever, document_chain)

7. Ask Your Question!

With everything wired up, you can now ask questions based on your custom dataset:

response = retrieval_chain.invoke({
    "input": "What are the talent products delivered by DASA?"
})
print(response["answer"])

The LLM, equipped with only the most relevant snippets from the website, provides an accurate and concise answer:

"The talent products delivered by DASA focus on enhancing individual and team capabilities in organizations, preparing them for high-performance environments that boost enterprise agility, customer centricity, and innovation."

The Power of Retrieval Chains

LangChain’s Retrieval Chains offer an efficient way to use large and complex datasets as LLM context — without exceeding token limits or sacrificing accuracy. Here's how it works, in summary:

Load and split data
Convert to vector embeddings
Store in a vector database
Retrieve relevant chunks per query
Send only those chunks + query to the LLM

This design makes your applications smarter, faster, and more scalable.

What’s Next?

In upcoming posts, we’ll explore even more advanced LangChain patterns, including:

Conversational Retrieval Chains — allowing the LLM to remember previous messages in a chat-like interface.
Agents — where the LLM autonomously decides what actions to take, enabling more dynamic and multi-step workflows.

Stay tuned! 👍

Building a Retrieval-Based LLM with LangChain

Table of contents