Retrieval-Augmented Generation (RAG): Architecture & Evaluation with Ragas


As Large Language Models (LLMs) become powerful tools for question answering and summarization, one major challenge still remains: retrieving up-to-date and domain-specific information. This is where Retrieval-Augmented Generation (RAG) systems come into play.
In this article, we’ll explore:
What is a RAG system?
RAG system architecture
Benefits and challenges
Evaluating RAG pipelines with the Ragas framework
Sample Python code using Ragas
🧠 What is a RAG System?
RAG stands for Retrieval-Augmented Generation. It combines traditional information retrieval methods with generative language models to produce more accurate, grounded, and up-to-date responses.
Instead of relying solely on the model's pre-trained knowledge, RAG retrieves relevant documents from a knowledge base and feeds them into an LLM to generate a response.
🏗️ RAG System Architecture
The architecture of a RAG pipeline typically involves these components:
1. Question/Query
The user input or question.
2. Retriever
Fetches relevant documents from an external data source (e.g., vector store, ElasticSearch, etc.)
Often uses dense vector embeddings (e.g., via
SentenceTransformers
,OpenAI Embeddings
, orFAISS
)Algorithms: BM25, DPR, or hybrid search (sparse + dense)
3. Reader / Generator (LLM)
An LLM takes the user query along with the retrieved context and generates a natural language response.
4. Optional Post-Processing
Can include filtering, ranking, or formatting.
🔄 Flow Diagram
✅ Benefits of RAG
Up-to-date: Can access information beyond model's training cutoff
Domain-specific: Allows integration with private or niche datasets
Explainability: Retrieved documents can be shown for verification
⚠️ Challenges
Retrieval quality heavily impacts answer relevance
Latency due to multiple steps (retrieval + generation)
Evaluation is non-trivial due to multiple outputs (query, docs, answer)
📏 Evaluating RAG with Ragas
Ragas is an open-source framework to evaluate RAG pipelines.
It evaluates based on:
Faithfulness: Is the answer grounded in the retrieved context?
Response Relevance: Does the answer properly address the question?
Context Precision: Are retrieved contexts actually useful?
Context Recall: Are all required pieces of evidence included?
🧪 Sample Code: RAG Evaluation with Ragas
Here's a minimal example showing how to use Ragas to evaluate a RAG system using Python.
▶️ Installation
pip install ragas datasets langchain
📄 Evaluation Script
from datasets import Dataset
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall
from ragas import evaluate
# 1. Create a sample dataset
examples = [
{
"question": "What is the capital of France?",
"answer": "The capital of France is Paris.",
"contexts": ["Paris is the capital and most populous city of France."],
"ground_truth": "Paris"
},
{
"question": "Who wrote Hamlet?",
"answer": "William Shakespeare wrote Hamlet.",
"contexts": ["Hamlet is a tragedy written by William Shakespeare."],
"ground_truth": "William Shakespeare"
}
]
dataset = Dataset.from_list(examples)
# 2. Evaluate the dataset using Ragas
result = evaluate(
dataset,
metrics=[
faithfulness,
answer_relevancy,
context_precision,
context_recall,
]
)
# 3. Print the results
print("RAG Evaluation Results:")
print(result.to_pandas())
🧾 Output
RAG Evaluation Results:
faithfulness answer_relevancy context_precision context_recall
0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0
You can adapt this example to your own data by collecting:
User queries
Generated answers
Retrieved documents
Ground truth answers (optional but useful)
🧠 Final Thoughts
RAG systems are a crucial step in making LLMs practical, reliable, and scalable in real-world scenarios, especially when paired with a sound evaluation framework like Ragas. Whether you’re building a chatbot, a document assistant, or a knowledge Q&A system, adopting RAG + Ragas gives you transparency and confidence in what your model says—and why.
📚 Further Reading
Subscribe to my newsletter
Read articles from Nishant Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
