Retrieval-Augmented Generation (RAG): Architecture & Evaluation with Ragas

Nishant SinghNishant Singh
3 min read

As Large Language Models (LLMs) become powerful tools for question answering and summarization, one major challenge still remains: retrieving up-to-date and domain-specific information. This is where Retrieval-Augmented Generation (RAG) systems come into play.

In this article, we’ll explore:

  1. What is a RAG system?

  2. RAG system architecture

  3. Benefits and challenges

  4. Evaluating RAG pipelines with the Ragas framework

  5. Sample Python code using Ragas


🧠 What is a RAG System?

RAG stands for Retrieval-Augmented Generation. It combines traditional information retrieval methods with generative language models to produce more accurate, grounded, and up-to-date responses.

Instead of relying solely on the model's pre-trained knowledge, RAG retrieves relevant documents from a knowledge base and feeds them into an LLM to generate a response.


🏗️ RAG System Architecture

The architecture of a RAG pipeline typically involves these components:

1. Question/Query

The user input or question.

2. Retriever

Fetches relevant documents from an external data source (e.g., vector store, ElasticSearch, etc.)

  • Often uses dense vector embeddings (e.g., via SentenceTransformers, OpenAI Embeddings, or FAISS)

  • Algorithms: BM25, DPR, or hybrid search (sparse + dense)

3. Reader / Generator (LLM)

An LLM takes the user query along with the retrieved context and generates a natural language response.

4. Optional Post-Processing

Can include filtering, ranking, or formatting.

🔄 Flow Diagram


✅ Benefits of RAG

  • Up-to-date: Can access information beyond model's training cutoff

  • Domain-specific: Allows integration with private or niche datasets

  • Explainability: Retrieved documents can be shown for verification


⚠️ Challenges

  • Retrieval quality heavily impacts answer relevance

  • Latency due to multiple steps (retrieval + generation)

  • Evaluation is non-trivial due to multiple outputs (query, docs, answer)


📏 Evaluating RAG with Ragas

Ragas is an open-source framework to evaluate RAG pipelines.

It evaluates based on:

  • Faithfulness: Is the answer grounded in the retrieved context?

  • Response Relevance: Does the answer properly address the question?

  • Context Precision: Are retrieved contexts actually useful?

  • Context Recall: Are all required pieces of evidence included?


🧪 Sample Code: RAG Evaluation with Ragas

Here's a minimal example showing how to use Ragas to evaluate a RAG system using Python.

▶️ Installation

pip install ragas datasets langchain

📄 Evaluation Script

from datasets import Dataset
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall
from ragas import evaluate

# 1. Create a sample dataset
examples = [
    {
        "question": "What is the capital of France?",
        "answer": "The capital of France is Paris.",
        "contexts": ["Paris is the capital and most populous city of France."],
        "ground_truth": "Paris"
    },
    {
        "question": "Who wrote Hamlet?",
        "answer": "William Shakespeare wrote Hamlet.",
        "contexts": ["Hamlet is a tragedy written by William Shakespeare."],
        "ground_truth": "William Shakespeare"
    }
]

dataset = Dataset.from_list(examples)

# 2. Evaluate the dataset using Ragas
result = evaluate(
    dataset,
    metrics=[
        faithfulness,
        answer_relevancy,
        context_precision,
        context_recall,
    ]
)

# 3. Print the results
print("RAG Evaluation Results:")
print(result.to_pandas())

🧾 Output

RAG Evaluation Results:
   faithfulness  answer_relevancy  context_precision  context_recall
0     1.0              1.0               1.0               1.0
1     1.0              1.0               1.0               1.0

You can adapt this example to your own data by collecting:

  • User queries

  • Generated answers

  • Retrieved documents

  • Ground truth answers (optional but useful)


🧠 Final Thoughts

RAG systems are a crucial step in making LLMs practical, reliable, and scalable in real-world scenarios, especially when paired with a sound evaluation framework like Ragas. Whether you’re building a chatbot, a document assistant, or a knowledge Q&A system, adopting RAG + Ragas gives you transparency and confidence in what your model says—and why.


📚 Further Reading

0
Subscribe to my newsletter

Read articles from Nishant Singh directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Nishant Singh
Nishant Singh