RAG Pipeline

If you've ever dreamed of winning the Football World Cup and building a smart AI assistant in the same week, then you're in for a treat. Today, we’re diving into the RAG pipeline (Retrieval-Augmented Generation) but instead of your typical dry ML explanation, we’re putting on our cleats and heading to the pitch.

Let’s break down how RAG works, piece by piece, just like assembling a football (soccer) team with code examples using Gemini AI.

What is RAG?

Imagine your Large Language Model (LLM) is like a star football player say, a legendary striker like Messi or Ronaldo. This player has trained for years, knows countless moves, and can score goals from nearly anywhere on the pitch. This is like how an LLM is trained on massive amounts of text and can generate fluent, creative responses.

But here’s the catch: Even the best player can’t remember every single playbook or strategy from every team they face. Sometimes they need fresh, up-to-date tactics or specific info about an opposing team to really shine in a match.

In technical terms:

RAG = LLM (Large Language Model) + External Knowledge Source (via retrieval)

It combines retrieval from a knowledge base and generation via a language model to give smart, context-aware responses.

Components of a RAG Pipeline ( The Team )

Let’s build our AI football team, where each stage of the RAG pipeline corresponds to a football strategy:

1. Knowledge Source = Scouting Report

Imagine you’re a football coach preparing for a big match. One of the first things you do is collect scouting reports detailed notes about your upcoming opponents, their star players, tactics, and weaknesses. You wouldn’t just rely on memory or guesswork. Instead, you gather valuable information from multiple sources like:

Past match videos
Player statistics
Expert analyses
Training reports

These reports help you build a clear picture so your team can prepare the best strategy.

Your Knowledge Source in the RAG pipeline is just like that scouting report for your AI.

It’s where all the important information lives.
It can be PDFs, websites, internal documents (like Notion, Confluence), databases, or any collection of text.
This knowledge is what your AI will later reference to provide accurate answers.

For our example, imagine we have a file called "football_facts.md" a markdown document filled with interesting football trivia, player records, World Cup stats, etc.

from langchain.document_loaders import TextLoader

loader = TextLoader("football_facts.md")
documents = loader.load()

Here’s what happens:

The TextLoader is like your assistant collecting the scouting reports from the filing cabinet.
It loads the raw text content from the markdown file into the pipeline.
This “raw data” now becomes your knowledge base foundation.

2. Chunking = Passing the Ball

Imagine you’re on the football field with the ball you don’t just kick it all the way across in one go. Instead, you pass the ball strategically to teammates in manageable portions, keeping control and making sure it reaches the right place at the right time.

Your knowledge source (the scouting reports) can be massive pages and pages of text. Trying to process it all at once would be like attempting a goal from midfield without any passes clumsy and ineffective. So what do we do is we break the data into smaller, manageable pieces called chunks.

Why chunk?

Efficient retrieval: Smaller chunks are easier to search through quickly.
Context preservation: Overlapping chunks (where pieces share some content) help keep context intact, much like how a good pass keeps the flow of the game.
Better matching: When a user asks a question, you want the system to retrieve the most relevant chunk, not an overwhelming blob of text.

# Chunking
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(documents)

Here, chunk_size=500 means each chunk is about 500 characters roughly a paragraph or two.
chunk_overlap=100 means the last 100 characters of one chunk are repeated at the start of the next. This overlap ensures the AI keeps some context flowing across chunks, similar to a smooth pass that keeps your team coordinated.

Football analogy recap:

Football Concept	RAG Pipeline Step	Explanation
Passing the ball	Chunking	Breaking down knowledge into “passable” pieces
Overlapping passes	Overlapping chunks	Keeping context and smooth transition
Team control & flow	Controlled retrieval	Helps AI find the right info efficiently

3. User Query = The Coach’s Command

You’re in the middle of a tense match. The coach is pacing the sidelines and suddenly shouts:

“Mark the striker! He’s their top scorer!”

That command is clear, strategic, and time-sensitive and it's meant to trigger immediate action. In your RAG pipeline, the user query plays exactly the same role. It’s the directive that sets everything in motion.

The user query is like the coach shouting instructions from the dugout it's a prompt that demands your AI assistant (the player) to analyze the situation, look up the playbook (knowledge source), and make a smart move (response).

query = "Who has scored the most goals in World Cup history?"

This is your coach asking a very targeted question. The AI’s job now is to:

Understand the intent: Find out it’s a factual sports trivia question.
Search the knowledge base: Look through the “scouting reports” (chunks).
Retrieve relevant info: Get just the right facts no wild guesses or aimless dribbling!

4. Retrieval = Midfielder Picks the Best Options

Imagine you're watching a football match. The coach has shouted a command (the user query), and now it's up to the midfielder to act. But not just any pass will do the midfielder must:

Scan the field
Spot the most strategic players (forwards)
Deliver the perfect assist

This is exactly what retrieval does in a RAG pipeline. The AI midfielder (retriever) scans through all the available knowledge chunks and picks out the ones that are most relevant chunks to the user’s query.

In this case, the midfielder (retriever) uses vector search a method that transforms both the query and all the knowledge chunks into high-dimensional vectors and finds the ones that are closest in meaning.

Import Libraries

 from langchain.vectorstores import Qdrant
 from langchain.embeddings import GoogleGenerativeAIEmbeddings
 from qdrant_client import QdrantClient
 from qdrant_client.http.models import Distance, VectorParams

Generate Embeddings using Gemini

 embeddings = GoogleGenerativeAIEmbeddings(
     model="models/embedding-001",
     api_key="your-gemini-api-key"
 )

Spin the Docker to Connect to local Qdrant instance

 qdrant = QdrantClient(
     host="localhost",
     port=6333,
 )

Create a new collection (like a database table)
```
 collection_name = "football-facts"
```

Load chunks into Qdrant as vector database

 vectorstore = Qdrant.from_documents(
     documents=chunks,
     embedding=embeddings,
     client=qdrant,
     collection_name=collection_name,
 )

Perform similarity search (retrieve top 5 matching chunks)

 query = "Who has scored the most goals in World Cup history?"
 relevant_docs = vectorstore.similarity_search(query, k=5)

 # Print results
 for doc in relevant_docs:
     print(doc.page_content)

Next step? Let's pass these “retrieved balls” to your LLM striker (Gemini) to score that perfect goal.

5. System Prompt = Game Plan

In football, before the whistle blows, the coach gathers the team and says:

“Here’s what we know about the opponent. Here's the formation we’re running. Watch out for their left winger. Our target? Win the midfield and feed the striker!”

This isn’t just noise it’s strategic context. It's a tactical blueprint that guides players to make smart, coordinated decisions during the game. In RAG, we do the same thing with a system prompt. It’s the final preparation we give our language model (LLM) before it goes out on the pitch (generates a response).

Once we’ve:

Scouted the field (knowledge source)
Passed around the ball (chunking)
Understood the coach's call (query)
Identified best teammates (retrieval)

Now we need to frame everything as a coherent context for our AI (Gemini) to work with.

# Combine all relevant document chunks into one context block
context = "\n\n".join([doc.page_content for doc in relevant_docs])

# Build the system prompt
system_prompt = f"""You are a helpful football assistant. Based on the following data, answer the user's question.

Context:
{context}

Question: {query}
"""

The system prompt:

Focuses the model on what's relevant
Embeds domain-specific tone (“football assistant”)
Combines retrieved data into a consumable form

You’re basically saying:

“Hey AI, based on this scouting report, here’s the question. Now give us your best tactical answer.”

Football Analogy:

Football Element	RAG Equivalent	Explanation
Tactical game plan	System prompt	Coach briefing the team with match intelligence
Whiteboard strategy	Structured context format	Clear, readable, organized instructions
Who we’re playing & how	Query + relevant chunks	Gives the LLM exactly what it needs to perform well

6. LLM = The Star Striker (Gemini AI)

Once the game plan is set and the players are in position, it’s the striker’s job to score the goal. You don’t want your midfielder taking wild shots, and you definitely don’t want your goalie trying to dribble past five defenders. In RAG, the striker is your LLM (Gemini) the one that generates the final answer using:

The user’s question
The relevant chunks (context) you retrieved
The system prompt (our tactical game plan)

Now it’s time to let Gemini take the shot.

import google.generativeai as genai

# Configure Gemini with your API key
genai.configure(api_key="your-gemini-api-key")

# Select the Gemini model
model = genai.GenerativeModel("gemini-pro")

# Send the system prompt as the input
response = model.generate_content(system_prompt)

# Print the generated answer
print("\n🤖 Answer from Gemini:")
print(response.text)

Full RAG Pipeline Recap (Football Style)

Step	RAG Task	Football Equivalent
Data Ingestion	Load markdown/Notion/PDF	Scouting Report
Chunking	Break content into chunks	Passing the ball
User Query	User asks a question	Coach’s command
Retrieval	Vector DB finds top docs	Midfielder finds best play
System Prompt	Format input for LLM	Game plan + whiteboard
LLM Generation	Generate final answer	Striker scores the goal

# RAG Football Assistant

from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Qdrant
from langchain.embeddings import GoogleGenerativeAIEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
import google.generativeai as genai

# -----------------------------
# Step 1: Load Data (Scouting Report)
# -----------------------------
loader = TextLoader("football_facts.md")
documents = loader.load()

# -----------------------------
# Step 2: Chunk the Documents (Pass the Ball)
# -----------------------------
splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = splitter.split_documents(documents)

# -----------------------------
# Step 3: User Query (Coach’s Command)
# -----------------------------
query = "Who has scored the most goals in World Cup history?"

# -----------------------------
# Step 4: Vector Embedding + Qdrant Storage (Midfield Strategy)
# -----------------------------
embeddings = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    api_key="your-gemini-api-key"  # Replace with your Gemini API key, Better use Enviornment Variable.
)

qdrant = QdrantClient(host="localhost", port=6333)

collection_name = "football-facts"
qdrant.recreate_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=768, distance=Distance.COSINE)
)

vectorstore = Qdrant.from_documents(
    documents=chunks,
    embedding=embeddings,
    client=qdrant,
    collection_name=collection_name
)

relevant_docs = vectorstore.similarity_search(query, k=3)

# -----------------------------
# Step 5: Game Plan (System Prompt)
# -----------------------------
context = "\n".join([doc.page_content for doc in relevant_docs])
system_prompt = f"""
You are a helpful football assistant. Based on the following data, answer the user's question.

Context:
{context}

Question: {query}
"""

# -----------------------------
# Step 6: Final Shot! (LLM Response)
# -----------------------------
genai.configure(api_key="your-gemini-api-key")
model = genai.GenerativeModel("gemini-pro")
response = model.generate_content(system_prompt)

print("\n🤖 Answer from Gemini:")
print(response.text)

Summary

This project demonstrates an end-to-end Retrieval-Augmented Generation (RAG) pipeline using Gemini AI and Qdrant vector database, explained through a fun football analogy. Just like a coach uses scouting reports, we start by loading domain knowledge (e.g. football facts), split it into manageable “passes” (chunks), and use a user query (the coach’s command) to retrieve the most relevant chunks via vector similarity search (midfielder selecting best plays). These chunks are passed to a system prompt (game plan), and finally, the large language model (LLM) Gemini responds like a striker taking the final shot. The setup ensures accurate, domain-aware answers without retraining the LLM.

Don’t Just Pass the Ball, Pass the Embeddings! How AI Plays Football with RAG + Qdrant

Table of contents