Mastering Retrieval-Augmented Generation (RAG). A Deep Dive with Gemini and Qdrant

Avinash KumarAvinash Kumar
55 min read

Table of Contents

  • What is RAG and Why is it Important?

  • The Core Idea: Retrieval, Augmentation & Generation with Gemini

    • Types of RAG:

      1. Based on Retrieval Type

        1. Dense Retrieval RAG: Semantic Search with Google Embeddings & Qdrant

        2. Sparse Retrieval RAG: Keyword Matching

        3. Hybrid Retrieval RAG: The Best of Both Worlds (Gemini/Qdrant + Sparse)

      2. Based on Augmentation Strategy

        1. Pre-Retrieval RAG: Retrieve Before Answering

        2. Post-Retrieval RAG: Verify After Answering

        3. Iterative Retrieval RAG: Step-by-Step Reasoning with Gemini

      3. Based on Response Generation

        1. Extractive RAG: Copy and Paste

        2. Abstractive RAG: Summarize and Explain with Gemini

        3. Mixed RAG: Combining Extraction and Abstraction with Gemini

    • Advanced & Specialized RAG Types

      1. Agent-Based RAG: Autonomous Exploration with Gemini Models

      2. Multi-Modal RAG: Beyond Text with Gemini Pro Vision/Gemini 1.5 Pro

      3. Memory-Augmented RAG: Remembering the Past

      4. Structured Data RAG: Databases as Knowledge Sources + Gemini for Text-to-SQL

      5. Graph-Based RAG: Relationships and Connections + Gemini for Text-to-GraphQuery

    • Conclusion: Choosing the Right RAG Approach

      • Key Considerations for Selecting a RAG Type

      • The Future of Retrieval-Augmented Generation


What is RAG and Why is it Important?

Concept: Retrieval-Augmented Generation (RAG) is a powerful AI framework designed to make Large Language Models (LLMs) like Google's Gemini more accurate, relevant, and trustworthy. It works by combining a retrieval system (like a search engine over your documents, often powered by a vector database like Qdrant) with a generative model (Gemini).

Instead of relying solely on the vast but potentially outdated or generic knowledge embedded within Gemini during its training, a RAG system first retrieves relevant snippets of information from an external, up-to-date knowledge base (your documents, databases, websites, etc.) related to the user's query. This retrieved information is then augmented into the prompt provided to Gemini, giving it the necessary context to generate a well-informed response.

Why is it Important?

  1. Reduces Hallucinations By grounding responses in real data, RAG significantly decreases the chances of the LLM inventing facts.

  2. Access to Current Information RAG systems can query knowledge bases that are continuously updated, overcoming the LLM's static knowledge cutoff.

  3. Domain-Specific Expertise Allows LLMs to answer questions about niche topics or internal company knowledge without expensive fine-tuning.

  4. Transparency and Trust RAG can often cite its sources, allowing users to verify the information.

  5. Cost-Effective Often cheaper and faster than fine-tuning an LLM for specific knowledge domains.

Retrieval, Augmentation & Generation with Gemini

  1. Retrieval: Given a user query, the system first converts the query into a vector embedding using a Google embedding model (e.g., text-embedding-004). It then searches a knowledge source, such as a Qdrant vector database containing document chunks previously embedded using the same model, to find the most semantically relevant pieces of information.

  2. Augmentation: The retrieved information (the context) is combined with the original user query. This forms an augmented prompt, specifically structured to guide the Gemini model. This prompt now contains both the user's question and the relevant external knowledge needed to answer it accurately.

  3. Generation: The augmented prompt is sent to a generative Gemini model (e.g., gemini-1.5-pro-latest or gemini-1.0-pro). Gemini uses the provided context along with its powerful language understanding and generation capabilities to create the final, context-aware response.

Code Snippet (Sample Code):

import os
import google.generativeai as genai
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, VectorParams, Distance

# ENV variables
QDRANT_HOST = "YOUR_QDRANT_HOST" 
QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"

# Initialize Gemini
genai.configure(api_key=GEMINI_API_KEY)

# Step 1: Initialize Qdrant client
qdrant = QdrantClient(
    url=QDRANT_HOST,
    api_key=QDRANT_API_KEY,
)

COLLECTION_NAME = "support-bot-docs"

# Step 2: Create collection (if not exists)
try:
    qdrant.get_collection(collection_name=COLLECTION_NAME)
    print(f"Collection '{COLLECTION_NAME}' already exists.")
except Exception:
    qdrant.create_collection(
        collection_name=COLLECTION_NAME,
        vectors_config=VectorParams(size=768, distance=Distance.COSINE),
    )
    print(f"Collection '{COLLECTION_NAME}' created.")

# Step 3: Sample documents
docs = [
    {"id": "1", "text": "..."},
    {"id": "2", "text": "..."},
    {"id": "3", "text": "..."},
    {"....."}
]

# Step 4: Embed documents & upload to Qdrant
points = []
for doc in docs:
    response = genai.embed_content(model="models/embedding-001", content=doc["text"])
    embedding = response["embedding"]
    points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))

qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
print("Documents upserted to Qdrant.")

# Step 5: Query
query = "Why ........?"

# Step 6: Embed query
query_response = genai.embed_content(model="models/embedding-001", content=query)
query_vector = query_response["embedding"]

# Step 7: Search Qdrant
hits = qdrant.search(
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=2,
)

# Step 8: Display Results
print("\nTop results for query:", query)
for hit in hits:
    print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")

(Requires pip install qdrant-client google-generativeai rank-bm25 sentence-transformers )

Types of RAG

RAG systems aren't one-size-fits-all. They can be categorized based on how they perform retrieval, augmentation, and generation.

1. Based on Retrieval Type

1.1. Dense Retrieval RAG: Semantic Search with Google Embeddings & Qdrant

  • Concept: This approach leverages the power of semantic understanding. It uses dense vector embeddings (numerical representations capturing meaning) generated by models like Google's text-embedding-004 or models/embedding-001. It searches a dedicated vector database, Qdrant, for documents whose embeddings are semantically closest (e.g., using cosine similarity) to the query embedding. This focuses on matching meaning rather than just keywords.

  • Real-life Example: A customer support bot answering "Why is my order late?" by finding documents in a Qdrant collection related to "shipping delays," "delivery times," or "order status," using embeddings generated via the Gemini API.

  • Pros: Excellent at understanding the intent behind a query, even if keywords don't match exactly. Handles synonyms and related concepts well.

  • Cons: Requires generating embeddings for all documents (can be computationally intensive upfront). Depends heavily on the quality of the embedding model. Retrieval performance relies on efficient vector database indexing (like Qdrant's HNSW).

  • Flow Chart (Dense Retrieval):

  • Code Snippet (Dense Retrieval):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Step 1: Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      COLLECTION_NAME = "Dense-Retrieval"
    
      # Step 2: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 3: Sample documents
      docs = [
          {"id": "1", "text": "We are experiencing delays in shipping due to weather conditions."},
          {"id": "2", "text": "Shipping may take 5-7 business days during holiday seasons."},
          {"id": "3", "text": "Refunds are processed within 3-5 business days."},
      ]
    
      # Step 4: Embed documents & upload to Qdrant
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant.")
    
      # Step 5: Query
      query = "Why is my order late?"
    
      # Step 6: Embed query
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
    
      # Step 7: Search Qdrant
      hits = qdrant.search(
          collection_name=COLLECTION_NAME,
          query_vector=query_vector,
          limit=3,
      )
    
      # Step 8: Display Results
      print("\nTop results for query:", query)
      for hit in hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    

1.2. Sparse Retrieval RAG: Keyword Matching

  • Concept: Relies on traditional information retrieval algorithms like TF-IDF or BM25 (Okapi BM25 is popular). It finds documents primarily based on the presence, frequency, and rarity of specific keywords from the query. It's fast and very effective for queries containing unique identifiers, codes, or specific jargon. This method doesn't directly use Gemini for retrieval but complements it in hybrid systems.

  • Real-life Example: Searching internal technical documentation for a specific error code like ERR_CONN_REFUSED or a product SKU like XYZ-123.

  • Pros: Very fast for keyword lookups. Effective for exact matches, codes, and jargon. Doesn't require expensive embeddings. Mature and well-understood algorithms.

  • Cons: Fails to capture semantic meaning (synonyms, related concepts are missed). Performance degrades with ambiguous queries or lack of keyword overlap. Requires text preprocessing (tokenization, lowercasing).

  • Flow Chart (Sparse Retrieval):

  • Code Snippet (Sparse Retrieval):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      COLLECTION_NAME = "Sparse-Retrieval"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (expanded with a tech-related example)
      docs = [
          {"id": "1", "text": "We are experiencing delays in shipping due to weather conditions."},
          {"id": "2", "text": "Shipping may take 5-7 business days during holiday seasons."},
          {"id": "3", "text": "Refunds are processed within 3-5 business days."},
          {"id": "4", "text": "ERR_CONN_REFUSED: Check your network settings or firewall."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      # Tokenize documents for BM25 (split text into words)
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query
      query = "ERR_CONN_REFUSED"
    
      # Step 6: BM25 Sparse Retrieval
      tokenized_query = query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
    
      # Get top documents with scores
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 7: Dense Retrieval with Qdrant
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.search(
          collection_name=COLLECTION_NAME,
          query_vector=query_vector,
          limit=3,
      )
    
      # Step 8: Display Results
      print("\nBM25 Sparse Retrieval Results for query:", query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nDense Retrieval Results for query:", query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    

1.3. Hybrid Retrieval RAG: The Best of Both Worlds (Gemini/Qdrant + Sparse)

  • Concept: Combines the strengths of both dense (semantic) and sparse (keyword) retrieval. It runs both types of searches – Dense using Google Embeddings + Qdrant, and Sparse using BM25 – and then merges the results. A re-ranking algorithm (like Reciprocal Rank Fusion - RRF, or using a cross-encoder model) is often used to produce a final, balanced list that benefits from both semantic relevance and keyword precision.

  • Real-life Example: A legal research assistant searching for cases related to "intellectual property disputes involving generative AI" needs both the semantic understanding of the concepts (dense) and the ability to pinpoint specific legal terms or case names like "fair use" (sparse).

  • Pros: Leverages both semantic understanding and keyword precision. Generally yields more relevant results across a wider range of query types. Robust to queries that might fail with only one method.

  • Cons: More complex to implement and tune. Requires managing both dense (Qdrant) and sparse (e.g., BM25 index) systems. Fusion/re-ranking adds computational overhead. Requires careful score normalization if combining scores directly.

  • Flow Chart (Hybrid Retrieval):

  • Code Snippet (Hybrid Retrieval):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Hybrid-Retrieval"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (tailored to query)
      docs = [
          {"id": "1", "text": "The MYC gene regulates cell growth and is influenced by environmental stressors."},
          {"id": "2", "text": "Climate adaptation in plants involves genetic changes, including MYC gene expression."},
          {"id": "3", "text": "Shipping delays may occur due to extreme weather conditions."},
          {"id": "4", "text": "MYC gene mutations are linked to cancer, not climate adaptation."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query
      query = "MYC gene in climate adaptation"
    
      # Step 6: Dense Retrieval with Qdrant
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.search(
          collection_name=COLLECTION_NAME,
          query_vector=query_vector,
          limit=4,  # Retrieve more to allow re-ranking
      )
    
      # Step 7: BM25 Sparse Retrieval
      tokenized_query = query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
      # Step 8: Hybrid Retrieval
      # Normalize scores
      dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
      bm25_scores = {result["text"]: result["score"] for result in bm25_results}
      all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
      max_dense = max(dense_scores.values(), default=1.0)
      max_bm25 = max(bm25_scores.values(), default=1.0)
    
      # Combine scores (weighted: 60% dense, 40% sparse)
      hybrid_results = {}
      for text in all_texts:
          dense_score = dense_scores.get(text, 0) / max_dense
          bm25_score = bm25_scores.get(text, 0) / max_bm25
          hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
          hybrid_results[text] = hybrid_score
    
      # Step 9: Re-ranking with Cross-Encoder
      rerank_inputs = [[query, text] for text in hybrid_results.keys()]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_results = [
          {"text": text, "score": rerank_scores[i]}
          for i, text in enumerate(hybrid_results.keys())
      ]
      reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 10: Display Results
      print("\nDense Retrieval Results for query:", query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    
      print("\nBM25 Sparse Retrieval Results for query:", query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nHybrid Retrieval Results for query:", query)
      for text, score in sorted(hybrid_results.items(), key=lambda x: x[1], reverse=True)[:3]:
          print(f"- {text} (Hybrid Score: {score:.4f})")
    
      print("\nRe-ranked Hybrid Results for query:", query)
      for result in reranked_results:
          print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
    

2. Based on Augmentation Strategy

When and how is the retrieved information incorporated?

2.1. Pre-Retrieval RAG: Retrieve Before Answering

  • Concept: This is the standard and most common RAG workflow. Retrieve relevant context first using any retrieval method (Dense/Qdrant, Sparse, Hybrid), then augment the prompt with this context, and finally generate the answer using Gemini.

  • Real-life Example: A Q&A bot answering "What are the side effects of Medication X?" by first retrieving relevant medical abstracts from Qdrant and then asking Gemini to summarize them based on the query.

  • Pros: Simple and intuitive workflow. Generation benefits directly from the retrieved context. Most common implementation.

  • Cons: If the initial query is ambiguous, irrelevant context might be retrieved, potentially leading the LLM astray.

  • Flow Chart (Pre-Retrieval):

  • Code Snippet (Pre-Retrieval):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Pre-Retrieval"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (medical context for Medication X)
      docs = [
          {"id": "1", "text": "Medication X may cause nausea, dizziness, and fatigue as common side effects."},
          {"id": "2", "text": "Rare side effects of Medication X include allergic reactions and liver issues."},
          {"id": "3", "text": "Medication X is used to treat hypertension but may cause headaches in some patients."},
          {"id": "4", "text": "Always consult a doctor before stopping Medication X due to side effects."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query
      query = "Medication X side effects"
    
      # Step 6: Dense Retrieval with Qdrant (Updated to query_points)
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.query_points(
          collection_name=COLLECTION_NAME,
          query=query_vector,
          limit=4,
          with_payload=True
      ).points
    
      # Step 7: BM25 Sparse Retrieval
      tokenized_query = query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
      # Step 8: Hybrid Retrieval
      dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
      bm25_scores = {result["text"]: result["score"] for result in bm25_results}
      all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
      max_dense = max(dense_scores.values(), default=1.0)
      max_bm25 = max(bm25_scores.values(), default=1.0)
    
      hybrid_results = {}
      for text in all_texts:
          dense_score = dense_scores.get(text, 0) / max_dense
          bm25_score = bm25_scores.get(text, 0) / max_bm25
          hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
          hybrid_results[text] = hybrid_score
    
      # Step 9: Re-ranking with Cross-Encoder
      rerank_inputs = [[query, text] for text in hybrid_results.keys()]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_results = [
          {"text": text, "score": rerank_scores[i]}
          for i, text in enumerate(hybrid_results.keys())
      ]
      reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 10: Generate Answer with Gemini (RAG)
      context = "\n".join([result["text"] for result in reranked_results])
      prompt = f"Based on the following context, provide a concise answer to the query: {query}\n\nContext:\n{context}\n\nAnswer:"
    
      # Configure Gemini model for generation
      model = genai.GenerativeModel("gemini-1.5-pro")
      response = model.generate_content(prompt)
    
      # Step 11: Display Results
      print("\nDense Retrieval Results for query:", query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    
      print("\nBM25 Sparse Retrieval Results for query:", query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nHybrid Retrieval Results for query:", query)
      for text, score in sorted(hybrid_results.items(), key=lambda x: x[1], reverse=True)[:3]:
          print(f"- {text} (Hybrid Score: {score:.4f})")
    
      print("\nRe-ranked Hybrid Results for query:", query)
      for result in reranked_results:
          print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
    
      print("\nGenerated Answer:")
      print(response.text)
    

2.2. Post-Retrieval RAG: Verify After Answering

  • Concept: Generate an initial answer using only Gemini's internal knowledge first. Then, use the original query and/or the initial answer to perform retrieval (e.g., from Qdrant). Finally, use Gemini again, providing the initial answer and the retrieved evidence, asking it to verify, correct, or refine the initial answer based on the evidence.

  • Real-life Example: A fact-checking system generates an initial answer to "Is Mars habitable for humans right now?". It then retrieves documents about Mars' atmosphere and conditions from Qdrant and asks Gemini to confirm or correct the initial answer based on the retrieved sources.

  • Pros: Can provide a very fast initial response if latency is critical. Useful for fact-checking or verifying LLM claims against known sources. Can potentially guide retrieval more effectively using the initial answer.

  • Cons: Requires two LLM calls (generation + verification), increasing latency and cost compared to Pre-Retrieval. The initial answer might be completely wrong or hallucinated.

  • Flow Chart (Post-Retrieval):

  • Code Snippet (Post-Retrieval):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Post-Retrieval"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (Mars habitability context)
      docs = [
          {"id": "1", "text": "Mars habitability is limited by its thin atmosphere and lack of liquid water."},
          {"id": "2", "text": "Evidence of ancient water flows on Mars suggests past habitability."},
          {"id": "3", "text": "Current Mars missions search for microbial life in subsurface ice."},
          {"id": "4", "text": "Terraforming Mars could make it habitable, but technology is decades away."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query
      query = "Mars habitability"
    
      # Step 6: Initial Answer Generation
      initial_prompt = f"Provide a brief answer to the query: {query}"
      model = genai.GenerativeModel("gemini-1.5-pro")
      initial_response = model.generate_content(initial_prompt)
      initial_answer = initial_response.text
    
      # Step 7: Dense Retrieval with Qdrant
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.query_points(
          collection_name=COLLECTION_NAME,
          query=query_vector,
          limit=4,
          with_payload=True
      ).points
    
      # Step 8: BM25 Sparse Retrieval
      tokenized_query = query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
      # Step 9: Hybrid Retrieval
      dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
      bm25_scores = {result["text"]: result["score"] for result in bm25_results}
      all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
      max_dense = max(dense_scores.values(), default=1.0)
      max_bm25 = max(bm25_scores.values(), default=1.0)
    
      hybrid_results = {}
      for text in all_texts:
          dense_score = dense_scores.get(text, 0) / max_dense
          bm25_score = bm25_scores.get(text, 0) / max_bm25
          hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
          hybrid_results[text] = hybrid_score
    
      # Step 10: Re-ranking with Cross-Encoder
      rerank_inputs = [[query, text] for text in hybrid_results.keys()]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_results = [
          {"text": text, "score": rerank_scores[i]}
          for i, text in enumerate(hybrid_results.keys())
      ]
      reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 11: Refine Answer with Retrieved Evidence
      context = "\n".join([result["text"] for result in reranked_results])
      refine_prompt = f"""
      Query: {query}
      Initial Answer: {initial_answer}
      Context: {context}
    
      Refine the initial answer based on the provided context to ensure accuracy and include relevant details. If the initial answer contains inaccuracies, correct them. Provide a concise, factual response.
      Answer:
      """
      refined_response = model.generate_content(refine_prompt)
      refined_answer = refined_response.text
    
      # Step 12: Display Results
      print("\nInitial Answer:")
      print(initial_answer)
    
      print("\nDense Retrieval Results for query:", query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    
      print("\nBM25 Sparse Retrieval Results for query:", query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nHybrid Retrieval Results for query:", query)
      for text, score in sorted(hybrid_results.items(), key=lambda x: x[1], reverse=True)[:3]:
          print(f"- {text} (Hybrid Score: {score:.4f})")
    
      print("\nRe-ranked Hybrid Results for query:", query)
      for result in reranked_results:
          print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
    
      print("\nRefined Answer:")
      print(refined_answer)
    

2.3. Iterative Retrieval RAG: Step-by-Step Reasoning with Gemini

  • Concept: Designed for complex queries that require multiple steps or pieces of information. It involves multiple rounds of interaction between retrieval and generation. Gemini might first generate a sub-query or identify missing information -> the system retrieves relevant context for that -> Gemini processes it and generates the next step or another sub-query -> retrieve again -> ... until a final answer can be synthesized. Gemini acts as the reasoner guiding the retrieval process.

  • Real-life Example: Answering "Compare the economic impacts of renewable energy adoption in Germany versus California, considering subsidies and grid stability." might involve: Retrieve German policies -> Gemini asks for California policies -> Retrieve CA policies -> Gemini asks for grid stability data for both -> Retrieve data -> Gemini synthesizes the final comparison.

  • Pros: Can break down complex problems into manageable steps. Allows the LLM to guide the information gathering process dynamically. Potentially more accurate for multi-faceted queries.

  • Cons: Significantly more complex to implement the control flow. Requires multiple LLM calls and retrieval steps, increasing latency and cost substantially. Risk of getting stuck in loops or irrelevant retrieval paths.

  • Flow Chart (Iterative Retrieval):

  • Code Snippet (Iterative Retrieval):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Iterative-Retrival"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (renewable energy in Germany and California)
      docs = [
          {"id": "1", "text": "Germany’s renewable energy mix includes 46% wind and solar in 2023, driven by Energiewende policies."},
          {"id": "2", "text": "California aims for 60% renewable energy by 2030, with heavy investment in solar farms."},
          {"id": "3", "text": "Germany’s feed-in tariffs have boosted solar and wind adoption since the 2000s."},
          {"id": "4", "text": "California’s renewable energy faces grid reliability challenges due to solar intermittency."},
          {"id": "5", "text": "Germany leads in offshore wind, while California focuses on rooftop solar."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query and Sub-Queries
      main_query = "comparing renewable energy in Germany vs. California"
      sub_queries = [
          "renewable energy in Germany",
          "renewable energy in California",
          "comparison of renewable energy in Germany and California"
      ]
    
      # Step 6: Iterative Retrieval-Generation
      model = genai.GenerativeModel("gemini-1.5-pro")
      partial_answers = []
    
      for cycle, sub_query in enumerate(sub_queries, 1):
          print(f"\nCycle {cycle}: Processing sub-query: {sub_query}")
    
          # Step 6.1: Dense Retrieval with Qdrant
          query_response = genai.embed_content(model="models/embedding-001", content=sub_query)
          query_vector = query_response["embedding"]
          dense_hits = qdrant.query_points(
              collection_name=COLLECTION_NAME,
              query=query_vector,
              limit=4,
              with_payload=True
          ).points
    
          # Step 6.2: BM25 Sparse Retrieval
          tokenized_query = sub_query.lower().split()
          bm25_scores = bm25.get_scores(tokenized_query)
          bm25_results = [
              {"text": docs[i]["text"], "score": bm25_scores[i]}
              for i in range(len(docs))
              if bm25_scores[i] > 0
          ]
          bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
          # Step 6.3: Hybrid Retrieval
          dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
          bm25_scores = {result["text"]: result["score"] for result in bm25_results}
          all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
          max_dense = max(dense_scores.values(), default=1.0)
          max_bm25 = max(bm25_scores.values(), default=1.0)
    
          hybrid_results = {}
          for text in all_texts:
              dense_score = dense_scores.get(text, 0) / max_dense
              bm25_score = bm25_scores.get(text, 0) / max_bm25
              hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
              hybrid_results[text] = hybrid_score
    
          # Step 6.4: Re-ranking with Cross-Encoder
          rerank_inputs = [[sub_query, text] for text in hybrid_results.keys()]
          rerank_scores = cross_encoder.predict(rerank_inputs)
          reranked_results = [
              {"text": text, "score": rerank_scores[i]}
              for i, text in enumerate(hybrid_results.keys())
          ]
          reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
          # Step 6.5: Generate Partial Answer
          context = "\n".join([result["text"] for result in reranked_results])
          prompt = f"""
          Based on the following context, provide a concise answer to the query: {sub_query}
          Context: {context}
          Answer:
          """
          response = model.generate_content(prompt)
          partial_answer = response.text
          partial_answers.append(partial_answer)
    
          # Display Cycle Results
          print(f"\nRe-ranked Hybrid Results for sub-query: {sub_query}")
          for result in reranked_results:
              print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
          print(f"\nPartial Answer: {partial_answer}")
    
      # Step 7: Final Synthesis
      synthesis_prompt = f"""
      Query: {main_query}
      Partial Answers:
      1. {partial_answers[0]}
      2. {partial_answers[1]}
      3. {partial_answers[2]}
    
      Synthesize the partial answers into a comprehensive, concise response comparing renewable energy in Germany and California. Highlight key similarities, differences, and notable policies or challenges.
      Answer:
      """
      final_response = model.generate_content(synthesis_prompt)
      final_answer = final_response.text
    
      # Step 8: Display Final Answer
      print("\nFinal Synthesized Answer:")
      print(final_answer)
    

3. Based on Response Generation

How is the final answer constructed by Gemini?

3.1. Extractive RAG: Copy and Paste

  • Concept: The final answer consists primarily, or entirely, of direct quotes or sentences extracted verbatim from the retrieved documents. Gemini's role is to identify and select the most relevant passage(s) that directly answer the query.

  • Real-life Example: Finding the precise legal definition of "osmosis" from a retrieved scientific glossary entry stored in Qdrant.

  • Pros: High fidelity to the source material. Reduces the risk of the LLM misinterpreting or hallucinating information. Good for providing exact definitions or evidence.

  • Cons: Answers can be less natural or conversational. May fail if the exact answer isn't present verbatim in the context. Might extract overly long or irrelevant surrounding text.

  • Flow chart (Extractive):

  • Code Snippet (Extractive RAG):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
      import re
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Extractive"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (osmosis context)
      docs = [
          {"id": "1", "text": "Osmosis is the diffusion of water molecules across a selectively permeable membrane from an area of higher water concentration to an area of lower water concentration."},
          {"id": "2", "text": "In biology, osmosis plays a critical role in maintaining cell hydration and nutrient transport."},
          {"id": "3", "text": "Osmosis differs from active transport, which requires energy to move substances against a concentration gradient."},
          {"id": "4", "text": "The process of osmosis is essential for plant roots to absorb water from the soil."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query
      query = "osmosis definition"
    
      # Step 6: Dense Retrieval with Qdrant
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.query_points(
          collection_name=COLLECTION_NAME,
          query=query_vector,
          limit=4,
          with_payload=True
      ).points
    
      # Step 7: BM25 Sparse Retrieval
      tokenized_query = query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
      # Step 8: Hybrid Retrieval
      dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
      bm25_scores = {result["text"]: result["score"] for result in bm25_results}
      all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
      max_dense = max(dense_scores.values(), default=1.0)
      max_bm25 = max(bm25_scores.values(), default=1.0)
    
      hybrid_results = {}
      for text in all_texts:
          dense_score = dense_scores.get(text, 0) / max_dense
          bm25_score = bm25_scores.get(text, 0) / max_bm25
          hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
          hybrid_results[text] = hybrid_score
    
      # Step 9: Re-ranking with Cross-Encoder
      rerank_inputs = [[query, text] for text in hybrid_results.keys()]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_results = [
          {"text": text, "score": rerank_scores[i]}
          for i, text in enumerate(hybrid_results.keys())
      ]
      reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 10: Snippet Extraction
      # Extract sentences containing "osmosis" and rank by relevance to "definition"
      snippets = []
      for result in reranked_results:
          text = result["text"]
          # Split text into sentences
          sentences = re.split(r'(?<=[.!?])\s+', text)
          for sentence in sentences:
              if "osmosis" in sentence.lower() and any(word in sentence.lower() for word in ["is", "defined", "definition"]):
                  snippets.append({"text": sentence.strip(), "score": result["score"]})
    
      # Sort snippets by score and select the top one (or more if needed)
      snippets = sorted(snippets, key=lambda x: x["score"], reverse=True)[:1]
    
      # Step 11: Optional Validation with Gemini
      # Use Gemini to format or confirm the snippet (minimal generation)
      model = genai.GenerativeModel("gemini-1.5-pro")
      if snippets:
          snippet_text = snippets[0]["text"]
          prompt = f"""
          Query: {query}
          Extracted Snippet: {snippet_text}
    
          Format the snippet as a quoted definition for the query. Ensure the text remains verbatim and add minimal context if needed.
          Answer:
          """
          response = model.generate_content(prompt)
          final_answer = response.text
      else:
          final_answer = "No exact definition of osmosis found in the provided documents."
    
      # Step 12: Display Results
      print("\nDense Retrieval Results for query:", query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    
      print("\nBM25 Sparse Retrieval Results for query:", query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nHybrid Retrieval Results for query:", query)
      for text, score in sorted(hybrid_results.items(), key=lambda x: x[1], reverse=True)[:3]:
          print(f"- {text} (Hybrid Score: {score:.4f})")
    
      print("\nRe-ranked Hybrid Results for query:", query)
      for result in reranked_results:
          print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
    
      print("\nExtracted Answer:")
      print(final_answer)
    

3.2. Abstractive RAG: Summarize and Explain with Gemini

  • Concept: This is the most common generation approach in RAG. Gemini synthesizes information from one or multiple retrieved context snippets and generates a new response in its own words. This allows for summarization, explanation, comparison, and more conversational answers.

  • Real-life Example: Asking Gemini to summarize several retrieved news articles about AI advancements in 2025 into a concise paragraph, based on context retrieved from Qdrant.

  • Pros: Produces natural, fluent, and conversational responses. Can synthesize information from multiple sources. More flexible than pure extraction.

  • Cons: Higher risk of hallucination or misinterpretation compared to extraction if the context is noisy or the LLM doesn't follow instructions precisely. May lose source fidelity.

  • Flow Chart (Abstractive) :

  • Code Snippet (Abstractive RAG):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Abstractive"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (news articles on AI advancements in 2025)
      docs = [
          {"id": "1", "text": "In 2025, AI models achieved breakthroughs in multimodal processing, integrating text, images, and audio for applications in healthcare and autonomous vehicles."},
          {"id": "2", "text": "Major tech companies in 2025 invested heavily in quantum AI, promising faster computation for complex problems like climate modeling."},
          {"id": "3", "text": "Ethical AI frameworks gained traction in 2025, with new regulations in Europe to ensure transparency in AI decision-making."},
          {"id": "4", "text": "AI-driven personalized education platforms expanded in 2025, tailoring curricula to individual student needs."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query
      query = "summarize AI advancements in 2025"
    
      # Step 6: Dense Retrieval with Qdrant
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.query_points(
          collection_name=COLLECTION_NAME,
          query=query_vector,
          limit=4,
          with_payload=True
      ).points
    
      # Step 7: BM25 Sparse Retrieval
      tokenized_query = query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
      # Step 8: Hybrid Retrieval
      dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
      bm25_scores = {result["text"]: result["score"] for result in bm25_results}
      all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
      max_dense = max(dense_scores.values(), default=1.0)
      max_bm25 = max(bm25_scores.values(), default=1.0)
    
      hybrid_results = {}
      for text in all_texts:
          dense_score = dense_scores.get(text, 0) / max_dense
          bm25_score = bm25_scores.get(text, 0) / max_bm25
          hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
          hybrid_results[text] = hybrid_score
    
      # Step 9: Re-ranking with Cross-Encoder
      rerank_inputs = [[query, text] for text in hybrid_results.keys()]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_results = [
          {"text": text, "score": rerank_scores[i]}
          for i, text in enumerate(hybrid_results.keys())
      ]
      reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 10: Abstractive Generation with Gemini
      context = "\n".join([result["text"] for result in reranked_results])
      prompt = f"""
      Query: {query}
      Context: {context}
    
      Summarize the key AI advancements in 2025 based on the provided context. Provide a concise, coherent response that captures the main points without quoting verbatim.
      Answer:
      """
      model = genai.GenerativeModel("gemini-1.5-pro")
      response = model.generate_content(prompt)
      summary = response.text
    
      # Step 11: Display Results
      print("\nDense Retrieval Results for query:", query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    
      print("\nBM25 Sparse Retrieval Results for query:", query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nHybrid Retrieval Results for query:", query)
      for text, score in sorted(hybrid_results.items(), key=lambda x: x[1], reverse=True)[:3]:
          print(f"- {text} (Hybrid Score: {score:.4f})")
    
      print("\nRe-ranked Hybrid Results for query:", query)
      for result in reranked_results:
          print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
    
      print("\nSummarized Answer:")
      print(summary)
    

3.3. Mixed RAG: Combining Extraction and Abstraction with Gemini

  • Concept: The generated response includes both synthesized parts (Gemini explaining or summarizing) and direct quotes from the retrieved context, often with citations (like Document IDs retrieved from Qdrant payloads). This strikes a balance between providing a natural, fluent answer and grounding it with specific evidence from the sources.

  • Real-life Example: A financial analyst asking Gemini to summarize the main risks mentioned in retrieved company reports (stored in Qdrant), while also quoting specific risk factor statements [Doc ID: report_1].

  • Pros: Balances natural language fluency with source fidelity and evidence. Improves transparency and trust by providing direct quotes with citations.

  • Cons: Requires more complex prompting to instruct the LLM to both summarize and quote appropriately. May be harder for the LLM to follow instructions perfectly. Requires retrieved context to have associated IDs or sources.

  • Flow Chart (Mixed RAG):

  • Code Snippet (Mixed RAG):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
      import re
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "mixed"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          # print(f"Collection '{COLLECTION_NAME}' created.)
    
      # Step 2: Sample documents (AI risks context)
      docs = [
          {"id": "1", "text": "AI systems pose ethical risks, including bias in decision-making. 'Algorithms can perpetuate existing inequalities if not carefully designed,' warns a 2025 ethics report."},
          {"id": "2", "text": "Technical risks of AI include system failures and vulnerabilities to hacking. 'A single flaw in AI could lead to catastrophic consequences,' notes a cybersecurity expert."},
          {"id": "3", "text": "Societal risks from AI involve job displacement and privacy erosion. 'Automation may disrupt 30% of jobs by 2030,' predicts an economic study."},
          {"id": "4", "text": "Regulatory gaps in AI governance increase risks of misuse. 'Without global standards, AI could be weaponized,' states a policy brief."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"], "doc_id": doc["id"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Query
      query = "summarize AI risks with quoted examples"
    
      # Step 6: Dense Retrieval with Qdrant
      query_response = genai.embed_content(model="models/embedding-001", content=query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.query_points(
          collection_name=COLLECTION_NAME,
          query=query_vector,
          limit=4,
          with_payload=True
      ).points
    
      # Step 7: BM25 Sparse Retrieval
      tokenized_query = query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i], "doc_id": docs[i]["id"]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
      # Step 8: Hybrid Retrieval
      dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
      bm25_scores = {result["text"]: result["score"] for result in bm25_results}
      all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
      max_dense = max(dense_scores.values(), default=1.0)
      max_bm25 = max(bm25_scores.values(), default=1.0)
    
      hybrid_results = {}
      doc_ids = {hit.payload["text"]: hit.payload["doc_id"] for hit in dense_hits}  # Map text to doc_id
      for text in all_texts:
          dense_score = dense_scores.get(text, 0) / max_dense
          bm25_score = bm25_scores.get(text, 0) / max_bm25
          hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
          hybrid_results[text] = {"score": hybrid_score, "doc_id": doc_ids.get(text, "unknown")}
    
      # Step 9: Re-ranking with Cross-Encoder
      rerank_inputs = [[query, text] for text in hybrid_results.keys()]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_results = [
          {"text": text, "score": rerank_scores[i], "doc_id": hybrid_results[text]["doc_id"]}
          for i, text in enumerate(hybrid_results.keys())
      ]
      reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 10: Snippet Extraction for Quotes
      quotes = []
      for result in reranked_results:
          text = result["text"]
          doc_id = result["doc_id"]
          # Extract quoted text within single quotes
          quoted_matches = re.findall(r"'(.*?)'", text)
          for quote in quoted_matches:
              if any(keyword in quote.lower() for keyword in ["risk", "ai", "bias", "failure", "job", "privacy", "misuse"]):
                  quotes.append({"text": quote, "score": result["score"], "doc_id": doc_id})
    
      # Sort quotes by score and select top 2
      quotes = sorted(quotes, key=lambda x: x["score"], reverse=True)[:2]
    
      # Step 11: Abstractive Summary with Gemini
      context = "\n".join([result["text"] for result in reranked_results])
      prompt = f"""
      Query: {query}
      Context: {context}
    
      Provide a concise summary of AI risks based on the context, integrating the following quoted examples with citations:
      {chr(10).join([f"- '{q['text']}' (Document {q['doc_id']})" for q in quotes])}
    
      The response should paraphrase the main points, include the quoted examples, and cite the document IDs in parentheses.
      Answer:
      """
      model = genai.GenerativeModel("gemini-1.5-pro")
      response = model.generate_content(prompt)
      summary = response.text
    
      # Step 12: Display Results
      print("\nDense Retrieval Results for query:", query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    
      print("\nBM25 Sparse Retrieval Results for query:", query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nHybrid Retrieval Results for query:", query)
      for text, info in sorted(hybrid_results.items(), key=lambda x: x[1]["score"], reverse=True)[:3]:
          print(f"- {text} (Hybrid Score: {info['score']:.4f})")
    
      print("\nRe-ranked Hybrid Results for query:", query)
      for result in reranked_results:
          print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
    
      print("\nMixed RAG Answer:")
      print(summary)
    

4. Advanced & Specialized RAG Types

Beyond the basic categories, several advanced RAG architectures leverage Gemini and other tools in more sophisticated ways.

4.1. Agent-Based RAG: Autonomous Exploration with Gemini Models

  • Concept: Instead of a fixed RAG pipeline, this uses an LLM-powered "agent" (often built using frameworks like LangChain or LlamaIndex, with a powerful Gemini model like gemini-1.5-pro-latest acting as the reasoning "brain"). This agent can dynamically decide what information it needs, when to retrieve it, and which tool to use (e.g., a Qdrant retriever for internal docs, a web search API, a calculator, a database query tool). It autonomously plans and executes steps to fulfill a complex user request.

  • Real-life Example: A research agent tasked with "Write a market report on vertical farming". It might use Gemini to plan: first, search internal docs (Qdrant) for company data; second, use a web search tool for market size; third, search Qdrant again for competitor info; finally, use Gemini to synthesize the report.

  • Pros: Highly flexible and powerful for complex, multi-step tasks. Can dynamically choose the best information source. Can potentially perform actions beyond just retrieval (e.g., calculations).

  • Cons: Significantly more complex to design, implement, and debug. Higher latency and cost due to multiple LLM calls for planning and execution. Potential for the agent to get stuck, hallucinate tool usage, or go off-topic. Relies heavily on the LLM's reasoning and planning capabilities.

  • Flow Chart (Agent-Based RAG):

  • Code Snippet (Agent-Based RAG):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
      from typing import List, Dict
      import json
      import re
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Agent-Based"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample local documents (vertical farming developments)
      docs = [
          {"id": "1", "text": "In 2024, AeroFarms expanded its vertical farming operations with a new facility in Saudi Arabia, leveraging AI-driven analytics for crop optimization."},
          {"id": "2", "text": "LED lighting advancements in 2023 reduced energy costs for vertical farms by 20%, enabling wider adoption of hydroponics."},
          {"id": "3", "text": "Urban Crop Solutions launched a financing program with Siemens in 2022 to support scalable vertical farming infrastructure."},
          {"id": "4", "text": "The global vertical farming market grew to USD 7.51 billion in 2024, driven by demand for organic produce and sustainable practices."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"], "doc_id": doc["id"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Simulated Web Search Tool (using provided web results)
      def web_search_tool(query: str) -> List[Dict]:
          web_results = [
              {
                  "source": "grandviewresearch.com",
                  "text": "The global vertical farming market size was valued at USD 6.92 billion in 2023 and is expected to grow at a CAGR of 20.1% from 2023 to 2030. Vertical farms are becoming technologically advanced, with the use of LED lights and automated control systems.",
                  "url": "https://www.grandviewresearch.com"
              },
              {
                  "source": "marketsandmarkets.com",
                  "text": "The global vertical farming market size was estimated at USD 5.6 billion in 2024 and is poised to reach USD 13.7 billion by 2029, growing at a CAGR of 19.7%. Developments in IoT, AI, and hydroponics increase efficiency.",
                  "url": "https://www.marketsandmarkets.com"
              },
              {
                  "source": "straitsresearch.com",
                  "text": "In urban settings, vertical farms develop a farm-to-table system, reducing food packaging and waste. LED technology advancements drive market growth.",
                  "url": "https://straitsresearch.com"
              }
          ]
          return web_results
    
      # Step 6: Gemini-Powered Agent
      class GeminiAgent:
          def __init__(self, model_name: str = "gemini-1.5-pro"):
              self.model = genai.GenerativeModel(model_name)
    
          def plan_retrieval(self, query: str) -> Dict:
              prompt = f"""
              Query: {query}
              You are an agent planning data retrieval for a market report. Decide which tools to use:
              - Web search for real-time market data and trends
              - Local document search for internal data
              Provide a plan as a JSON object with 'tools' (list) and 'rationale' (string). Ensure the response is valid JSON without markdown or code blocks.
              Example:
              {{"tools": ["web_search", "local_search"], "rationale": "Web search for real-time data and local search for internal insights."}}
              """
              try:
                  response = self.model.generate_content(prompt)
                  # Clean response: remove markdown code blocks or other formatting
                  cleaned_text = re.sub(r'```(?:json)?\n|\n```', '', response.text).strip()
                  # Parse JSON safely
                  plan = json.loads(cleaned_text)
                  # Validate expected structure
                  if not isinstance(plan, dict) or "tools" not in plan or "rationale" not in plan:
                      raise ValueError("Invalid plan structure")
                  return plan
              except Exception as e:
                  print(f"Error parsing plan: {e}")
                  print(f"Raw response: {response.text}")
                  # Fallback plan
                  return {
                      "tools": ["web_search", "local_search"],
                      "rationale": "Fallback: Use web search for real-time data and local search for internal insights due to parsing error."
                  }
    
          def execute_retrieval(self, plan: Dict, query: str) -> List[Dict]:
              results = []
              for tool in plan["tools"]:
                  if tool == "web_search":
                      web_results = web_search_tool(query)
                      results.extend([
                          {"text": r["text"], "source": r["source"], "url": r["url"], "type": "web"}
                          for r in web_results
                      ])
                  elif tool == "local_search":
                      # Dense Retrieval
                      query_response = genai.embed_content(model="models/embedding-001", content=query)
                      query_vector = query_response["embedding"]
                      dense_hits = qdrant.query_points(
                          collection_name=COLLECTION_NAME,
                          query=query_vector,
                          limit=4,
                          with_payload=True
                      ).points
                      results.extend([
                          {"text": hit.payload["text"], "source": f"Local Doc {hit.payload['doc_id']}", "type": "local"}
                          for hit in dense_hits
                      ])
                      # Sparse Retrieval (BM25)
                      tokenized_query = query.lower().split()
                      bm25_scores = bm25.get_scores(tokenized_query)
                      bm25_results = [
                          {"text": docs[i]["text"], "score": bm25_scores[i], "doc_id": docs[i]["id"]}
                          for i in range(len(docs))
                          if bm25_scores[i] > 0
                      ]
                      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
                      results.extend([
                          {"text": r["text"], "source": f"Local Doc {r['doc_id']}", "type": "local"}
                          for r in bm25_results
                      ])
              return results
    
          def generate_report(self, query: str, retrieved_data: List[Dict]) -> str:
              context = "\n".join([f"Source: {d['source']}\n{d['text']}" for d in retrieved_data])
              prompt = f"""
              Query: {query}
              Context: {context}
    
              Generate a concise market report on recent developments in vertical farming. Include:
              - A summary of market size and growth trends.
              - Key technological advancements.
              - Notable industry developments (e.g., partnerships, expansions).
              - Citations for sources in parentheses (e.g., grandviewresearch.com).
              The response should be coherent, paraphrased, and professional, avoiding verbatim quotes unless necessary.
              Answer:
              """
              response = self.model.generate_content(prompt)
              return response.text
    
      # Step 7: Query
      query = "market report on vertical farming recent developments"
    
      # Step 8: Agent Execution
      agent = GeminiAgent()
      plan = agent.plan_retrieval(query)
      print("\nRetrieval Plan:", plan)
    
      retrieved_data = agent.execute_retrieval(plan, query)
    
      # Step 9: Re-ranking with Cross-Encoder
      rerank_inputs = [[query, data["text"]] for data in retrieved_data]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_data = [
          {"text": data["text"], "source": data["source"], "score": rerank_scores[i], "type": data["type"]}
          for i, data in enumerate(retrieved_data)
      ]
      reranked_data = sorted(reranked_data, key=lambda x: x["score"], reverse=True)[:5]
    
      # Step 10: Generate Market Report
      report = agent.generate_report(query, reranked_data)
    
      # Step 11: Display Results
      print("\nRetrieved Data:")
      for data in retrieved_data:
          print(f"- Source: {data['source']}\n  {data['text']}")
    
      print("\nRe-ranked Data:")
      for data in reranked_data:
          print(f"- Source: {data['source']} (Score: {data['score']:.4f})\n  {data['text']}")
    
      print("\nMarket Report:")
      print(report)
    

4.2. Multi-Modal RAG: Beyond Text with Gemini Pro Vision/Gemini 1.5 Pro

  • Concept: Extends RAG to handle multiple data types (images, audio, video) alongside text. This requires multi-modal LLMs capable of understanding these inputs, like Google's gemini-1.5-pro-latest (which replaced Gemini Pro Vision for general availability). The retrieval step might also involve multi-modal embeddings (e.g., CLIP-style embeddings stored in Qdrant) to find relevant images or text based on an image query, or vice-versa.

  • Real-life Example: Uploading a picture of a bird and asking, "What type of bird is this [image], and where does it live according to our bird database?" The system might retrieve text descriptions from Qdrant based on visual similarity (using image embeddings) or text extracted from the image by Gemini, then Gemini generates the answer using the image and the retrieved text.

  • Pros: Can answer queries involving non-textual data. Enables richer interactions (e.g., visual Q&A). Leverages the power of multi-modal LLMs like Gemini 1.5 Pro.

  • Cons: Requires multi-modal LLMs. Retrieval becomes more complex (might need image embedding models and vector stores capable of handling them). Indexing multi-modal data can be challenging. Potentially higher inference costs.

  • Flow Chart (Multi-Modal RAG):

  • Code Snippet (Multi-Modal RAG):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
      import re
      import json
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Multi-Modal"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (bird descriptions)
      docs = [
          {"id": "1", "text": "The Northern Cardinal is a medium-sized songbird with a bright red crest, red body, and black face mask. It is commonly found in North America."},
          {"id": "2", "text": "The American Robin is a migratory bird with a reddish-orange breast, dark wings, and a white eye ring. It is widespread across the United States."},
          {"id": "3", "text": "The Blue Jay is known for its striking blue and white plumage, with a distinctive crest and black collar. It inhabits woodlands and suburban areas."},
          {"id": "4", "text": "The Black-capped Chickadee is a small bird with a black cap, white cheeks, and gray wings. It is known for its cheerful 'chick-a-dee' call."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"], "doc_id": doc["id"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Simulated Image Input and Query
      # In practice, replace image_description with an actual image file (e.g., Image.open("bird.jpg"))
      image_description = "A small bird with a red crest and black wings."
      text_query = "What bird is this?"
      combined_query = f"{text_query} Description: {image_description}"
    
      # Step 6: Simulated Visual Database (for image comparison)
      # In practice, store image embeddings in Qdrant using a vision model (e.g., CLIP)
      visual_db = [
          {"id": "img1", "description": "Bright red crest, black face, red body", "species": "Northern Cardinal"},
          {"id": "img2", "description": "Reddish-orange breast, dark wings", "species": "American Robin"},
          {"id": "img3", "description": "Blue and white plumage, black collar", "species": "Blue Jay"},
          {"id": "img4", "description": "Black cap, white cheeks, gray wings", "species": "Black-capped Chickadee"},
      ]
    
      def match_image(image_description: str) -> list[dict]:
          # Simulate image matching by comparing descriptions
          matches = []
          for img in visual_db:
              if "red crest" in image_description.lower() and "red crest" in img["description"].lower():
                  matches.append({"species": img["species"], "description": img["description"], "score": 0.9})
              elif "black wings" in image_description.lower() and "wings" in img["description"].lower():
                  matches.append({"species": img["species"], "description": img["description"], "score": 0.7})
          return sorted(matches, key=lambda x: x["score"], reverse=True)[:1]
    
      # Step 7: Dense Retrieval with Qdrant
      query_response = genai.embed_content(model="models/embedding-001", content=combined_query)
      query_vector = query_response["embedding"]
      dense_hits = qdrant.query_points(
          collection_name=COLLECTION_NAME,
          query=query_vector,
          limit=4,
          with_payload=True
      ).points
    
      # Step 8: BM25 Sparse Retrieval
      tokenized_query = combined_query.lower().split()
      bm25_scores = bm25.get_scores(tokenized_query)
      bm25_results = [
          {"text": docs[i]["text"], "score": bm25_scores[i], "doc_id": docs[i]["id"]}
          for i in range(len(docs))
          if bm25_scores[i] > 0
      ]
      bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
      # Step 9: Hybrid Retrieval
      dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
      bm25_scores = {result["text"]: result["score"] for result in bm25_results}
      all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
      max_dense = max(dense_scores.values(), default=1.0)
      max_bm25 = max(bm25_scores.values(), default=1.0)
    
      hybrid_results = {}
      doc_ids = {hit.payload["text"]: hit.payload["doc_id"] for hit in dense_hits}
      for text in all_texts:
          dense_score = dense_scores.get(text, 0) / max_dense
          bm25_score = bm25_scores.get(text, 0) / max_bm25
          hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
          hybrid_results[text] = {"score": hybrid_score, "doc_id": doc_ids.get(text, "unknown")}
    
      # Step 10: Re-ranking with Cross-Encoder
      rerank_inputs = [[combined_query, text] for text in hybrid_results.keys()]
      rerank_scores = cross_encoder.predict(rerank_inputs)
      reranked_results = [
          {"text": text, "score": rerank_scores[i], "doc_id": hybrid_results[text]["doc_id"]}
          for i, text in enumerate(hybrid_results.keys())
      ]
      reranked_results = sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
      # Step 11: Image Matching
      image_matches = match_image(image_description)
      image_context = "\n".join([f"Species: {m['species']}, Description: {m['description']}" for m in image_matches])
    
      # Step 12: Multi-Modal Generation with Gemini
      context = "\n".join([result["text"] for result in reranked_results])
      prompt = f"""
      Query: {text_query}
      Image Description: {image_description}
      Textual Context: {context}
      Image Context: {image_context}
    
      Identify the bird based on the image description and provided context. Provide a concise response, including:
      - The bird species.
      - Key identifying features.
      - A brief description from the context to support the identification.
      Use the textual and image context to ensure accuracy. Return the response as plain text without markdown or code blocks.
      """
      model = genai.GenerativeModel("gemini-1.5-pro")
      # In practice, pass an actual image: response = model.generate_content([prompt, Image.open("bird.jpg")])
      response = model.generate_content(prompt)
      # Clean response to remove any markdown
      cleaned_response = re.sub(r'```(?:text)?\n|\n```', '', response.text).strip()
      identification = cleaned_response
    
      # Step 13: Display Results
      print("\nDense Retrieval Results for query:", combined_query)
      for hit in dense_hits:
          print(f"- {hit.payload['text']} (Score: {hit.score:.4f})")
    
      print("\nBM25 Sparse Retrieval Results for query:", combined_query)
      for result in bm25_results:
          print(f"- {result['text']} (Score: {result['score']:.4f})")
    
      print("\nHybrid Retrieval Results for query:", combined_query)
      for text, info in sorted(hybrid_results.items(), key=lambda x: x[1]["score"], reverse=True)[:3]:
          print(f"- {text} (Hybrid Score: {info['score']:.4f})")
    
      print("\nRe-ranked Hybrid Results for query:", combined_query)
      for result in reranked_results:
          print(f"- {result['text']} (Re-ranked Score: {result['score']:.4f})")
    
      print("\nImage Matching Results:")
      for match in image_matches:
          print(f"- Species: {match['species']} (Score: {match['score']:.4f})")
    
      print("\nBird Identification:")
      print(identification)
    

4.3. Memory-Augmented RAG: Remembering the Past

  • Concept: Explicitly incorporates the history of the conversation into the RAG process. The retrieval step might use context from the chat history to disambiguate the current query or find more relevant information (e.g., searching Qdrant with an embedding of recent turns combined with the current query). Alternatively, or additionally, the augmentation step includes the conversation history alongside the newly retrieved context when prompting Gemini, allowing it to provide contextually aware follow-up answers.

  • Real-life Example: A troubleshooting chatbot remembering previous steps: User: "My Wi-Fi is down." Bot: "Try rebooting." User: "I did that." Bot (using history + retrieval for next steps based on context "reboot failed"): "Okay, since rebooting didn't work, let's check the cable connection [retrieved context about cables]."

  • Pros: Enables stateful, multi-turn conversations. Improves relevance by considering past interactions. Creates a more natural and helpful user experience for dialogues.

  • Cons: Managing conversation history adds complexity. Long histories can exceed LLM context windows or make retrieval queries less precise. Deciding how to best integrate history into retrieval and generation requires careful design.

  • Flow Chart (Memory-Augmented RAG):

  • Code Snippet (Memory-Augmented RAG):

      import os
      import google.generativeai as genai
      from qdrant_client import QdrantClient
      from qdrant_client.models import PointStruct, VectorParams, Distance
      from rank_bm25 import BM25Okapi
      from sentence_transformers import CrossEncoder
      import numpy as np
      import re
      from typing import List, Dict
    
      # ENV variables
      QDRANT_HOST = "YOUR_QDRANT_HOST" 
      QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" 
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Initialize Qdrant client
      qdrant = QdrantClient(
          url=QDRANT_HOST,
          api_key=QDRANT_API_KEY,
      )
    
      # Initialize cross-encoder for re-ranking
      cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')
    
      COLLECTION_NAME = "Memory-Augmented"
    
      # Step 1: Create collection (if not exists)
      try:
          qdrant.get_collection(collection_name=COLLECTION_NAME)
          print(f"Collection '{COLLECTION_NAME}' already exists.")
      except Exception:
          qdrant.create_collection(
              collection_name=COLLECTION_NAME,
              vectors_config=VectorParams(size=768, distance=Distance.COSINE),
          )
          print(f"Collection '{COLLECTION_NAME}' created.")
    
      # Step 2: Sample documents (router troubleshooting)
      docs = [
          {"id": "1", "text": "If the router is not connecting to the internet after a reboot, check the Ethernet cable connection to the modem and ensure the modem is powered on."},
          {"id": "2", "text": "A common issue post-reboot is incorrect Wi-Fi settings. Verify the SSID and password in the router’s admin panel."},
          {"id": "3", "text": "If the router’s lights are blinking amber after reboot, perform a factory reset by holding the reset button for 10 seconds."},
          {"id": "4", "text": "Slow internet after rebooting the router may indicate interference. Change the Wi-Fi channel to 1, 6, or 11 in the router settings."},
      ]
    
      # Step 3: Embed documents & upload to Qdrant (Dense Retrieval)
      points = []
      for doc in docs:
          response = genai.embed_content(model="models/embedding-001", content=doc["text"])
          embedding = response["embedding"]
          points.append(PointStruct(id=int(doc["id"]), vector=embedding, payload={"text": doc["text"], "doc_id": doc["id"]}))
    
      qdrant.upsert(collection_name=COLLECTION_NAME, points=points)
      print("Documents upserted to Qdrant for dense retrieval.")
    
      # Step 4: BM25 Setup for Sparse Retrieval
      tokenized_docs = [doc["text"].lower().split() for doc in docs]
      bm25 = BM25Okapi(tokenized_docs)
    
      # Step 5: Memory-Augmented Chatbot Class
      class TroubleshootingChatbot:
          def __init__(self, model_name: str = "gemini-1.5-pro", memory_size: int = 5):
              self.model = genai.GenerativeModel(model_name)
              self.memory: List[Dict] = []  # Store conversation history
              self.memory_size = memory_size  # Limit memory to last N exchanges
    
          def add_to_memory(self, user_query: str, system_response: str):
              """Add a user query and system response to memory."""
              self.memory.append({"user": user_query, "system": system_response})
              # Keep only the last memory_size exchanges
              self.memory = self.memory[-self.memory_size:]
    
          def get_context(self) -> str:
              """Generate context string from memory."""
              context = ""
              for exchange in self.memory:
                  context += f"User: {exchange['user']}\nSystem: {exchange['system']}\n"
              return context.strip()
    
          def retrieve_documents(self, query: str) -> List[Dict]:
              """Retrieve documents using hybrid retrieval, augmented by conversation context."""
              # Combine current query with memory context
              context = self.get_context()
              augmented_query = f"{context}\nCurrent Query: {query}" if context else query
    
              # Dense Retrieval
              query_response = genai.embed_content(model="models/embedding-001", content=augmented_query)
              query_vector = query_response["embedding"]
              dense_hits = qdrant.query_points(
                  collection_name=COLLECTION_NAME,
                  query=query_vector,
                  limit=4,
                  with_payload=True
              ).points
    
              # Sparse Retrieval (BM25)
              tokenized_query = augmented_query.lower().split()
              bm25_scores = bm25.get_scores(tokenized_query)
              bm25_results = [
                  {"text": docs[i]["text"], "score": bm25_scores[i], "doc_id": docs[i]["id"]}
                  for i in range(len(docs))
                  if bm25_scores[i] > 0
              ]
              bm25_results = sorted(bm25_results, key=lambda x: x["score"], reverse=True)[:4]
    
              # Hybrid Retrieval
              dense_scores = {hit.payload["text"]: hit.score for hit in dense_hits}
              bm25_scores = {result["text"]: result["score"] for result in bm25_results}
              all_texts = set(dense_scores.keys()).union(bm25_scores.keys())
    
              max_dense = max(dense_scores.values(), default=1.0)
              max_bm25 = max(bm25_scores.values(), default=1.0)
    
              hybrid_results = {}
              doc_ids = {hit.payload["text"]: hit.payload["doc_id"] for hit in dense_hits}
              for text in all_texts:
                  dense_score = dense_scores.get(text, 0) / max_dense
                  bm25_score = bm25_scores.get(text, 0) / max_bm25
                  hybrid_score = 0.6 * dense_score + 0.4 * bm25_score
                  hybrid_results[text] = {"score": hybrid_score, "doc_id": doc_ids.get(text, "unknown")}
    
              # Re-ranking with Cross-Encoder
              rerank_inputs = [[augmented_query, text] for text in hybrid_results.keys()]
              rerank_scores = cross_encoder.predict(rerank_inputs)
              reranked_results = [
                  {"text": text, "score": rerank_scores[i], "doc_id": hybrid_results[text]["doc_id"]}
                  for i, text in enumerate(hybrid_results.keys())
              ]
              return sorted(reranked_results, key=lambda x: x["score"], reverse=True)[:3]
    
          def generate_response(self, query: str, retrieved_docs: List[Dict]) -> str:
              """Generate a troubleshooting response using conversation history and retrieved documents."""
              context = self.get_context()
              doc_context = "\n".join([f"Doc {d['doc_id']}: {d['text']}" for d in retrieved_docs])
              prompt = f"""
              You are a troubleshooting chatbot helping with router issues. Use the conversation history and retrieved documents to provide a concise, relevant response to the current query. Avoid markdown or code blocks in the response.
    
              Conversation History:
              {context}
    
              Retrieved Documents:
              {doc_context}
    
              Current Query: {query}
    
              Provide a clear troubleshooting step or answer, referencing prior conversation details (e.g., "Since you rebooted the router") if relevant. Keep the response natural and concise.
              """
              try:
                  response = self.model.generate_content(prompt)
                  # Clean response to remove any markdown
                  cleaned_response = re.sub(r'```(?:text)?\n|\n```', '', response.text).strip()
                  return cleaned_response
              except Exception as e:
                  print(f"Error generating response: {e}")
                  return "Sorry, I encountered an issue. Please try again or provide more details."
    
      # Step 6: Simulate Troubleshooting Interaction
      chatbot = TroubleshootingChatbot()
    
      # Simulated conversation
      queries = [
          "My router isn’t connecting to the internet. I just rebooted it.",
          "The lights on the router are blinking amber now. What should I do?"
      ]
    
      for query in queries:
          # Retrieve documents
          retrieved_docs = chatbot.retrieve_documents(query)
    
          # Generate response
          response = chatbot.generate_response(query, retrieved_docs)
    
          # Add to memory
          chatbot.add_to_memory(query, response)
    
          # Display results
          print(f"\nUser Query: {query}")
          print("\nRetrieved Documents:")
          for doc in retrieved_docs:
              print(f"- Doc {doc['doc_id']}: {doc['text']} (Score: {doc['score']:.4f})")
          print("\nSystem Response:")
          print(response)
    
      # Step 7: Display Conversation History
      print("\nConversation History:")
      print(chatbot.get_context())
    

4.4. Structured Data RAG: Databases as Knowledge Sources + Gemini for Text-to-SQL

  • Concept: Retrieves information directly from structured databases (like SQL databases - PostgreSQL, MySQL, etc.). A key part often involves using a powerful LLM like Gemini, prompted with the database schema (table names, columns, types, relationships), to translate the user's natural language question into a formal database query (e.g., SQL). The system executes this generated query against the database, retrieves the results, and then uses Gemini again to synthesize a natural language answer based on those structured results.

  • Real-life Example: A business intelligence bot answering "What were our total sales for 'Electronics' last quarter?" Gemini generates the SQL query (SELECT SUM(amount) FROM sales WHERE product_category = 'Electronics' AND sale_date BETWEEN '...' AND '...') -> the query runs on the sales database -> Gemini summarizes the returned sales figure ($X,XXX.XX) into a sentence.

  • Pros: Allows querying structured, real-time data in databases. Leverages the precision of SQL. Can answer quantitative questions accurately. Gemini's Text-to-SQL capability avoids manual query writing.

  • Cons: Relies heavily on the LLM's ability to generate correct SQL based on the schema and natural language. Requires providing accurate and clear schema information to the LLM. Potential for generating inefficient or incorrect queries. Database access and query execution add latency. Security considerations for database access are crucial.

  • Flow Chart (Structured Data RAG):

  • Code Snippet (Structured Data RAG):

      import os
      import google.generativeai as genai
      import sqlite3
      import re
      import json
      from datetime import datetime, timedelta
      from typing import Dict, Any
    
      # ENV variables
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Step 1: Set up SQLite database
      def setup_database():
          """Create and populate a sample sales database."""
          conn = sqlite3.connect(":memory:")  # In-memory database for demo
          cursor = conn.cursor()
    
          # Create sales table
          cursor.execute("""
              CREATE TABLE sales (
                  sale_id INTEGER PRIMARY KEY,
                  sale_date DATE,
                  amount FLOAT,
                  product_category TEXT
              )
          """)
    
          # Insert sample data (sales from Q4 2024 and earlier)
          sample_sales = [
              ("2024-10-15", 1500.50, "Electronics"),
              ("2024-11-01", 800.25, "Clothing"),
              ("2024-12-10", 1200.75, "Electronics"),
              ("2024-07-05", 600.00, "Books"),
              ("2024-06-30", 900.00, "Clothing"),
          ]
          cursor.executemany("INSERT INTO sales (sale_date, amount, product_category) VALUES (?, ?, ?)", sample_sales)
          conn.commit()
          return conn
    
      # Step 2: Structured Data RAG Class
      class StructuredDataRAG:
          def __init__(self, model_name: str = "gemini-1.5-pro", db_conn: sqlite3.Connection = None):
              self.model = genai.GenerativeModel(model_name)
              self.conn = db_conn
              self.schema = """
              Table: sales
              Columns:
              - sale_id (INTEGER, PRIMARY KEY)
              - sale_date (DATE, e.g., '2024-07-15')
              - amount (FLOAT, sale amount in USD)
              - product_category (TEXT, e.g., 'Electronics', 'Clothing')
              """
    
          def generate_sql_query(self, query: str) -> str:
              """Generate an SQL query based on the natural language query."""
              prompt = f"""
              You are an expert SQL query generator. Given a natural language query and a database schema, generate a valid SQL query to retrieve the requested data. Return only the SQL query as plain text, without markdown, code blocks, or explanations.
    
              Database Schema:
              {self.schema}
    
              Query: {query}
    
              Example:
              For "total sales in 2024", return: SELECT SUM(amount) FROM sales WHERE strftime('%Y', sale_date) = '2024'
    
              Notes:
              - For "last quarter," assume the current date is {datetime.now().strftime('%Y-%m-%d')} and target the previous quarter (e.g., Q4 2024 for April 2025).
              - Use strftime for date comparisons.
    
              Generate the SQL query:
              """
              try:
                  response = self.model.generate_content(prompt)
                  cleaned_response = re.sub(r'```(?:sql)?\n|\n```', '', response.text).strip()
                  return cleaned_response
              except Exception as e:
                  print(f"Error generating SQL query: {e}")
                  return "SELECT 0 AS error"  # Fallback query
    
          def execute_query(self, sql_query: str) -> List[Dict]:
              """Execute the SQL query and return results as a list of dictionaries."""
              try:
                  cursor = self.conn.cursor()
                  cursor.execute(sql_query)
                  columns = [desc[0] for desc in cursor.description]
                  results = [dict(zip(columns, row)) for row in cursor.fetchall()]
                  # Handle null or empty results
                  if not results:
                      return [{"total_sales": 0.0}]
                  return results
              except Exception as e:
                  print(f"Error executing SQL query: {e}")
                  return [{"error": "Failed to execute query"}]
    
          def generate_response(self, query: str, data: List[Dict]) -> str:
              """Generate a natural language response based on retrieved data."""
              # Convert data to JSON string without f-string to avoid format specifier issues
              data_str = json.dumps(data, indent=2)
              # Build prompt as a regular string concatenation to avoid f-string issues
              prompt = (
                  "You are a data analyst. Given a natural language query and retrieved data from a database, "
                  "generate a concise, natural language response summarizing the results. "
                  "Return the response as plain text without markdown or code blocks.\n\n"
                  "Query: " + query + "\n\n"
                  "Retrieved Data: " + data_str + "\n\n"
                  "Example:\n"
                  "Query: total sales in 2024\n"
                  "Data: [{\"sum\": 5000.0}]\n"
                  "Response: The total sales in 2024 were $5,000.\n\n"
                  "Generate the response:"
              )
              try:
                  response = self.model.generate_content(prompt)
                  cleaned_response = re.sub(r'```(?:text)?\n|\n```', '', response.text).strip()
                  # Handle null or zero results in the response
                  if any("total_sales" in d and d["total_sales"] == 0.0 for d in data):
                      return "No sales were recorded for the last quarter."
                  return cleaned_response
              except Exception as e:
                  print(f"Error generating response: {e}")
                  return "Sorry, I couldn’t process the data. Please try again."
    
      # Step 3: Simulate Query
      query = "total sales last quarter"
    
      # Step 4: Initialize Database and RAG
      conn = setup_database()
      rag = StructuredDataRAG(db_conn=conn)
    
      # Step 5: Generate and Execute SQL Query
      sql_query = rag.generate_sql_query(query)
      print("\nGenerated SQL Query:")
      print(sql_query)
    
      data = rag.execute_query(sql_query)
      print("\nRetrieved Data:")
      print(json.dumps(data, indent=2))
    
      # Step 6: Generate Response
      response = rag.generate_response(query, data)
      print("\nResponse:")
      print(response)
    
      # Step 7: Clean up
      conn.close()
    

4.5. Graph-Based RAG: Relationships and Connections + Gemini for Text-to-GraphQuery

  • Concept: Leverages knowledge graphs (like Neo4j, NebulaGraph) as the structured information source. Similar to Structured Data RAG, it often uses an LLM like Gemini, prompted with the graph schema (node labels, relationship types, properties), to translate natural language questions into graph query languages (e.g., Cypher for Neo4j, SPARQL for RDF graphs). The query is executed against the graph database, and Gemini synthesizes the answer from the results, which often represent complex relationships and paths.

  • Real-life Example: Asking "Which actors starred in movies directed by Christopher Nolan?" Gemini generates a Cypher query (MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Director {name: 'Christopher Nolan'}) RETURN a.name) -> the query runs on the movie knowledge graph -> Gemini summarizes the resulting list of actors.

  • Pros: Excellent for answering questions about relationships, connections, and paths within data. Can uncover insights hidden in relational structures. Leverages the power of graph databases. Gemini's Text-to-Cypher/SPARQL ability automates graph querying.

  • Cons: Requires a well-structured knowledge graph. Relies heavily on the LLM's ability to generate correct graph queries based on the schema. Graph query languages can be complex. Performance depends on graph database efficiency and indexing. Schema representation for the LLM is critical.

  • Flow Chart (Graph-Based RAG):

  • Code Snippet (Graph-Based RAG):

      import os
      import google.generativeai as genai
      import re
      import json
      from typing import List, Dict, Any
    
      # ENV variables
      GEMINI_API_KEY = "YOUR_GEMINI_API_KEY"
    
      # Initialize Gemini
      genai.configure(api_key=GEMINI_API_KEY)
    
      # Step 1: Mock Neo4j Knowledge Graph
      class MockNeo4j:
          """Simulate a Neo4j knowledge graph with movie data."""
          def __init__(self):
              # Sample graph data: nodes (Actor, Movie, Director) and relationships
              self.graph_data = [
                  {"actor": "Christian Bale", "movie": "The Dark Knight", "director": "Christopher Nolan", "year": 2008},
                  {"actor": "Heath Ledger", "movie": "The Dark Knight", "director": "Christopher Nolan", "year": 2008},
                  {"actor": "Leonardo DiCaprio", "movie": "Inception", "director": "Christopher Nolan", "year": 2010},
                  {"actor": "Joseph Gordon-Levitt", "movie": "Inception", "director": "Christopher Nolan", "year": 2010},
                  {"actor": "Cillian Murphy", "movie": "Oppenheimer", "director": "Christopher Nolan", "year": 2023},
                  {"actor": "Robert Downey Jr.", "movie": "Oppenheimer", "director": "Christopher Nolan", "year": 2023},
                  {"actor": "Keanu Reeves", "movie": "The Matrix", "director": "Wachowskis", "year": 1999},
              ]
    
          def run_query(self, cypher_query: str) -> List[Dict]:
              """Simulate executing a Cypher query against the graph."""
              try:
                  results = []
                  # Parse common Cypher patterns
                  if "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Director {name: 'Christopher Nolan'})" in cypher_query:
                      for entry in self.graph_data:
                          if entry["director"] == "Christopher Nolan":
                              result = {"actor": entry["actor"], "movie": entry["movie"]}
                              if "m.year" in cypher_query:
                                  result["year"] = entry["year"]
                              results.append(result)
                  elif "MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)" in cypher_query and "Christopher Nolan" in cypher_query:
                      for entry in self.graph_data:
                          if entry["director"] == "Christopher Nolan":
                              result = {"actor": entry["actor"], "movie": entry["movie"]}
                              if "m.year" in cypher_query:
                                  result["year"] = entry["year"]
                              results.append(result)
                  else:
                      results.append({"error": "Unsupported query"})
                  return results if results else [{"error": "No results found"}]
              except Exception as e:
                  print(f"Error executing Cypher query: {e}")
                  return [{"error": "Failed to execute query"}]
    
      # Step 2: Graph-Based RAG Class
      class GraphBasedRAG:
          def __init__(self, model_name: str = "gemini-1.5-pro", graph_db: Any = None):
              self.model = genai.GenerativeModel(model_name)
              self.graph_db = graph_db
              self.schema = """
              Knowledge Graph Schema:
              Nodes:
              - Actor (properties: name)
              - Movie (properties: title, year)
              - Director (properties: name)
              Relationships:
              - (:Actor)-[:ACTED_IN]->(:Movie)
              - (:Director)-[:DIRECTED]->(:Movie)
              Example:
              (a:Actor {name: 'Christian Bale'})-[:ACTED_IN]->(m:Movie {title: 'The Dark Knight', year: 2008})<-[:DIRECTED]-(d:Director {name: 'Christopher Nolan'})
              """
    
          def is_safe_cypher(self, query: str) -> bool:
              """Validate Cypher query for safety."""
              dangerous_keywords = ["CREATE", "DELETE", "REMOVE", "SET", "MERGE"]
              return not any(keyword in query.upper() for keyword in dangerous_keywords)
    
          def generate_cypher_query(self, query: str) -> str:
              """Generate a Cypher query based on the natural language query."""
              prompt = f"""
              You are an expert Cypher query generator for a Neo4j knowledge graph. Given a natural language query and a graph schema, generate a valid Cypher query to retrieve the requested data. Return only the Cypher query as plain text, without markdown, code blocks, or explanations.
    
              Graph Schema:
              {self.schema}
    
              Query: {query}
    
              Example:
              For "actors in Nolan’s movies", return:
              MATCH (a:Actor)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Director {{name: 'Christopher Nolan'}})
              RETURN a.name AS actor, m.title AS movie
    
              Generate the Cypher query:
              """
              try:
                  response = self.model.generate_content(prompt)
                  cleaned_response = re.sub(r'```(?:cypher)?\n|\n```', '', response.text).strip()
                  if not self.is_safe_cypher(cleaned_response):
                      print("Unsafe Cypher query detected")
                      return "MATCH () RETURN 'error' AS error"
                  return cleaned_response
              except Exception as e:
                  print(f"Error generating Cypher query: {e}")
                  return "MATCH () RETURN 'error' AS error"
    
          def execute_query(self, cypher_query: str) -> List[Dict]:
              """Execute the Cypher query against the graph database."""
              return self.graph_db.run_query(cypher_query)
    
          def generate_response(self, query: str, data: List[Dict]) -> str:
              """Generate a natural language response based on retrieved graph data."""
              data_str = json.dumps(data, indent=2)
              prompt = (
                  "You are a data analyst. Given a natural language query and retrieved data from a knowledge graph, "
                  "generate a concise, natural language response summarizing the results. "
                  "Return the response as plain text without markdown or code blocks.\n\n"
                  "Query: " + query + "\n\n"
                  "Retrieved Data: " + data_str + "\n\n"
                  "Example:\n"
                  "Query: actors in Nolan’s movies\n"
                  "Data: [{\"actor\": \"Christian Bale\", \"movie\": \"The Dark Knight\"}, {\"actor\": \"Leonardo DiCaprio\", \"movie\": \"Inception\"}]\n"
                  "Response: Actors in Christopher Nolan’s movies include Christian Bale (The Dark Knight) and Leonardo DiCaprio (Inception).\n\n"
                  "Generate the response:"
              )
              try:
                  response = self.model.generate_content(prompt)
                  cleaned_response = re.sub(r'```(?:text)?\n|\n```', '', response.text).strip()
                  if any("error" in d for d in data):
                      return "No relevant data found for the query."
                  return cleaned_response
              except Exception as e:
                  print(f"Error generating response: {e}")
                  return "Sorry, I couldn’t process the data. Please try again."
    
          def verify_response(self, query: str, cypher_query: str, data: List[Dict], response: str) -> str:
              """Verify the accuracy of the generated response."""
              data_str = json.dumps(data, indent=2)
              prompt = (
                  "Verify if the response '" + response + "' accurately reflects the query '" + query + "', "
                  "Cypher query '" + cypher_query + "', and data " + data_str + ".\n"
                  "Return a plain text verdict without markdown or code blocks."
              )
              try:
                  verification = self.model.generate_content(prompt)
                  return re.sub(r'```(?:text)?\n|\n```', '', verification.text).strip()
              except Exception as e:
                  print(f"Error verifying response: {e}")
                  return "Verification failed."
    
      # Step 3: Simulate Query
      query = "actors in Nolan’s movies"
    
      # Step 4: Initialize Graph and RAG
      graph_db = MockNeo4j()
      rag = GraphBasedRAG(graph_db=graph_db)
    
      # Step 5: Generate and Execute Cypher Query
      cypher_query = rag.generate_cypher_query(query)
      print("\nGenerated Cypher Query:")
      print(cypher_query)
    
      data = rag.execute_query(cypher_query)
      print("\nRetrieved Data:")
      print(json.dumps(data, indent=2))
    
      # Step 6: Generate Response
      response = rag.generate_response(query, data)
      print("\nResponse:")
      print(response)
    
      # Step 7: Verify Response
      verification = rag.verify_response(query, cypher_query, data, response)
      print("\nVerification:")
      print(verification)
    

5. Conclusion: Choosing the Right RAG Approach

RAG is not a single technique but a flexible design pattern. The optimal RAG architecture for your application depends heavily on your specific requirements, the nature of your data, and the user experience you aim to provide. Leveraging powerful tools like Google Gemini for generation and reasoning, combined with efficient retrieval systems like the Qdrant vector database, offers a robust foundation for building sophisticated and reliable AI applications.

5.1. Key Considerations for Selecting a RAG Type:

  • Knowledge Source:

    • Unstructured Text (PDFs, docs, websites): Dense (Gemini Embeddings + Qdrant), Sparse (BM25), or Hybrid is ideal.

    • Structured Tables (SQL/NoSQL): Structured Data RAG with Gemini for Text-to-SQL.

    • Knowledge Graphs (Neo4j): Graph RAG with Gemini for Text-to-GraphQuery.

    • Mixed Data (Images, Text): Multi-Modal RAG with Gemini 1.5 Pro.

  • Query Complexity:

    • Simple lookups/Q&A: Standard Pre-Retrieval RAG is often sufficient.

    • Multi-step reasoning, complex comparisons: Iterative RAG or Agent-Based RAG using Gemini's reasoning.

  • Accuracy vs. Fluency:

    • Need exact snippets/definitions: Extractive RAG.

    • Need conversational summaries, explanations: Abstractive RAG (standard Gemini use).

    • Need balance of both: Mixed RAG or Abstractive RAG prompted to cite sources.

  • Latency Requirements:

    • Fastest: Sparse retrieval.

    • Very Fast: Dense retrieval (Gemini Embeddings + Qdrant) is highly optimized.

    • Slower: Iterative or Agent-based flows involve multiple LLM calls and retrievals. Multi-modal processing can also add latency.

  • Data Freshness Needs: Standard Pre-Retrieval RAG accesses the latest indexed data in Qdrant/DBs. Post-Retrieval might give a slightly stale initial answer before verification.

  • Citations / Explainability: Mixed RAG, Post-Retrieval RAG, or explicitly prompting Gemini in Abstractive RAG to reference retrieved document IDs/sources. Qdrant payloads can store metadata for citation.

  • Conversation Context: One-off queries vs. ongoing dialogue? Memory-Augmented RAG using Gemini's chat capabilities is essential for dialogues.

  • Cost & Compute:

    • API Calls: Gemini API usage (generation, embeddings) has costs based on tokens/characters. More complex flows (Iterative, Agent) mean more calls.

    • Infrastructure: Dense retrieval requires hosting/managing Qdrant (or using Qdrant Cloud). Storing embeddings also has costs.

    • Gemini Model Choice: Using gemini-1.5-pro is more capable but more expensive than gemini-1.0-pro or potentially upcoming smaller/faster models.

5.2. The Future of Retrieval-Augmented Generation

RAG is a rapidly evolving field, pushing the boundaries of what LLMs can achieve. We can expect advancements like:

  • Smarter Retrieval: More adaptive retrieval strategies, better query transformations using LLMs (like Gemini) to understand nuances, improved hybrid fusion techniques, and potentially LLMs selecting the best retrieval method dynamically.

  • Deeper Integration: Tighter coupling between the retriever (like Qdrant) and the LLM (Gemini), potentially allowing the LLM to guide the retrieval process more directly, interact with retrieval parameters, or even access retrieval tools intrinsically.

  • End-to-End Optimization: Frameworks and techniques for jointly training or fine-tuning the retriever, embedder (Gemini embeddings), and generator (Gemini LLM) components for specific tasks, leading to more synergistic performance.

  • Enhanced Agentic RAG: More sophisticated agents (powered by models like Gemini 1.5 Pro, Gemini 2.0 Flash and beyond) capable of using a wider array of tools (including Qdrant, databases, APIs), planning more complex tasks, learning from interactions, and performing more robust error handling.

  • Proactive RAG: Systems that anticipate user information needs based on context or past behavior and retrieve relevant information from sources like Qdrant before a question is explicitly asked, enabling faster and more contextually relevant interactions.

By understanding the different types of RAG and carefully considering the trade-offs, developers can build powerful, accurate, and trustworthy applications leveraging the combined strengths of generative models like Gemini and efficient retrieval systems like Qdrant.

Open in GitHub

Connect With Me

Twitter: Rock12231

GitHub: Rock12231

LinkedIn: Rock1223

Email: avinashkumar2rock@gmail.com

10
Subscribe to my newsletter

Read articles from Avinash Kumar directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Avinash Kumar
Avinash Kumar