Understanding RAG: Architecture and Evaluation

As Large Language Models (LLMs) become powerful tools for question answering and summarization, one major challenge still remains: retrieving up-to-date and domain-specific information. This is where Retrieval-Augmented Generation (RAG) systems come into play.

In this article, we’ll explore:

What is a RAG system?
RAG system architecture
Benefits and challenges
Evaluating RAG pipelines with the Ragas framework
Sample Python code using Ragas

🧠 What is a RAG System?

RAG stands for Retrieval-Augmented Generation. It combines traditional information retrieval methods with generative language models to produce more accurate, grounded, and up-to-date responses.

Instead of relying solely on the model's pre-trained knowledge, RAG retrieves relevant documents from a knowledge base and feeds them into an LLM to generate a response.

🏗️ RAG System Architecture

The architecture of a RAG pipeline typically involves these components:

1. Question/Query

The user input or question.

2. Retriever

Fetches relevant documents from an external data source (e.g., vector store, ElasticSearch, etc.)

Often uses dense vector embeddings (e.g., via SentenceTransformers, OpenAI Embeddings, or FAISS)
Algorithms: BM25, DPR, or hybrid search (sparse + dense)

3. Reader / Generator (LLM)

An LLM takes the user query along with the retrieved context and generates a natural language response.

4. Optional Post-Processing

Can include filtering, ranking, or formatting.

🔄 Flow Diagram

✅ Benefits of RAG

Up-to-date: Can access information beyond model's training cutoff
Domain-specific: Allows integration with private or niche datasets
Explainability: Retrieved documents can be shown for verification

⚠️ Challenges

Retrieval quality heavily impacts answer relevance
Latency due to multiple steps (retrieval + generation)
Evaluation is non-trivial due to multiple outputs (query, docs, answer)

📏 Evaluating RAG with Ragas

Ragas is an open-source framework to evaluate RAG pipelines.

It evaluates based on:

Faithfulness: Is the answer grounded in the retrieved context?
Response Relevance: Does the answer properly address the question?
Context Precision: Are retrieved contexts actually useful?
Context Recall: Are all required pieces of evidence included?

🧪 Sample Code: RAG Evaluation with Ragas

Here's a minimal example showing how to use Ragas to evaluate a RAG system using Python.

▶️ Installation

pip install ragas datasets langchain

📄 Evaluation Script

from datasets import Dataset
from ragas.metrics import faithfulness, answer_relevancy, context_precision, context_recall
from ragas import evaluate

# 1. Create a sample dataset
examples = [
    {
        "question": "What is the capital of France?",
        "answer": "The capital of France is Paris.",
        "contexts": ["Paris is the capital and most populous city of France."],
        "ground_truth": "Paris"
    },
    {
        "question": "Who wrote Hamlet?",
        "answer": "William Shakespeare wrote Hamlet.",
        "contexts": ["Hamlet is a tragedy written by William Shakespeare."],
        "ground_truth": "William Shakespeare"
    }
]

dataset = Dataset.from_list(examples)

# 2. Evaluate the dataset using Ragas
result = evaluate(
    dataset,
    metrics=[
        faithfulness,
        answer_relevancy,
        context_precision,
        context_recall,
    ]
)

# 3. Print the results
print("RAG Evaluation Results:")
print(result.to_pandas())

🧾 Output

RAG Evaluation Results:
   faithfulness  answer_relevancy  context_precision  context_recall
0     1.0              1.0               1.0               1.0
1     1.0              1.0               1.0               1.0

You can adapt this example to your own data by collecting:

User queries
Generated answers
Retrieved documents
Ground truth answers (optional but useful)

🧠 Final Thoughts

RAG systems are a crucial step in making LLMs practical, reliable, and scalable in real-world scenarios, especially when paired with a sound evaluation framework like Ragas. Whether you’re building a chatbot, a document assistant, or a knowledge Q&A system, adopting RAG + Ragas gives you transparency and confidence in what your model says—and why.

Retrieval-Augmented Generation (RAG): Architecture & Evaluation with Ragas