The Ultimate Research Sidekick: Constructing a RAG Assistant with Gemini Api Python SDK

nataliyah ahmadnataliyah ahmad
16 min read

Ever found yourself sifting through a long PDF document – maybe a research paper, a report, or an ebook – looking for a specific piece of information? What if you could just ask the document a question and get a direct answer?

Good news! With the power of Large Language Models (LLMs) like Google Gemini and smart tools like vector databases (ChromaDB), you absolutely can. And we can do it all right here in a Kaggle Notebook!

In this tutorial, we'll walk through how to load a PDF, break it down, understand its content using AI embeddings, store it in a searchable database, and finally, use Gemini to answer your questions based only on the information in your PDF.

Let's dive in!

What You'll Need

  1. A Kaggle Account: That's where we'll run our code.

  2. A Google API Key: You'll need one to access Google's Gemini models. You can get one easily (and for free!) at Google AI Studio.

  3. A PDF Document: Any PDF you want to "chat" with!

Step 1: Get Your PDF into Kaggle

Before we write any code, we need to get your PDF file accessible within your Kaggle Notebook environment. The best way to do this is by adding it as a Kaggle Dataset.

Here's how:

  1. Create a New Notebook: Go to Kaggle and create a new notebook (or open an existing one).

  2. Add Data: Look for the "+ Add Data" button (usually on the right side or top menu). Click it.

  3. Create New Dataset: In the "Add Data" sidebar that appears, click the "Create New Dataset" button.

  4. Upload Your File: Drag and drop your PDF file into the upload area or click to browse your files.

  5. Name Your Dataset: Give your dataset a clear name (e.g., my-research-paper, company-report-pdf, my-ebook).

  6. Set Privacy: Choose if you want the dataset to be Public or Private. For personal documents, definitely choose Private!

  7. Create: Click the "Create" button.

  8. Attach to Notebook: Once the dataset is created, go back to your notebook. Click "+ Add Data" again. This time, search for the dataset name you just created under "Your Datasets". Click "Add" next to it.

Now, your PDF is available in your notebook! You'll find it under the /kaggle/input/your-dataset-name/ directory. Note down the exact path to your file (e.g., /kaggle/input/my-research-paper/my-awesome-paper.pdf). In our example code, the path is /kaggle/input/pdf-of-reasoning-survey/a survey on efficient reasoning for large language models.pdf.

Step 2: Set Up Your Google API Key Securely

You should never put your API key directly in your code where others might see it. Kaggle provides a secure way to store secrets like API keys.

  1. Add a Secret: In your Kaggle Notebook, click the "Secrets" tab (often found in the right-hand sidebar, looks like a key icon).

  2. Add New Secret: Click "Add a new secret".

  3. Name the Secret: In the "Name" field, type GOOGLE_API_KEY. This must match the name used in the code.

  4. Paste Your Key: In the "Secret" field, paste your actual Google API Key you got from Google AI Studio.

  5. Save: Click "Save".

  6. Enable: Make sure the "Attached to notebook" checkbox is ticked for your current notebook.

Now, your code can safely access your API key without it being visible to anyone looking at your notebook!

Step 3: Install the Necessary Tools

Our code uses a few external libraries. We need to install them in our Kaggle environment. The %%capture at the beginning just hides the installation output to keep our notebook clean. The ! tells Kaggle to run this line as a command in the terminal.

%%capture
# Uninstall potentially conflicting packages and install what we need
!pip uninstall -qqy jupyterlab kfp  # Removing these helps avoid some errors
!pip install -qU "google-genai==1.7.0" "chromadb==0.6.3" "langchain-community==0.0.5" pymupdf
# Get your API key securely from Kaggle Secrets
GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
import os
from kaggle_secrets import UserSecretsClient
# Import the libraries we just installed
from google import genai
from google.genai import types
from IPython.display import Markdown # Useful for displaying formatted text in the notebook
# Check the version of the google-genai library
genai.__version__

Explanation:

  • import os and from kaggle_secrets import UserSecretsClient: These lines are how we access the secure API key we stored in Step 2.

  • GOOGLE_API_KEY = ...: This line retrieves the API key using the name "GOOGLE_API_KEY" that we set in Kaggle Secrets.

  • %%capture and !pip install...: This is where we install the libraries:

    • google-genai: The official Python library for interacting with Google's Gemini models.

    • chromadb: A simple and powerful vector database we'll use to store and search our document parts.

    • langchain-community: A library that provides helpful tools for working with LLMs, including document loaders.

    • pymupdf: A fast library for reading and processing PDFs.

  • The from ... import ... lines bring in the specific classes and functions we need from these libraries.

  • genai.__version__: Just a quick check to see which version of the Google GenAI library we're using.

Step 4: Initialize the Google Gemini Client and Check Models

Now that we have our API key and libraries, we can connect to Google's services.

# Initialize the Gemini client using your secure API key
client = genai.Client(api_key=GOOGLE_API_KEY)

# List the available models, specifically looking for ones that can create embeddings
print("Available Embedding Models:")
for m in client.models.list():
    if "embedContent" in m.supported_actions:
        print(m.name)

Explanation:

  • client = genai.Client(api_key=GOOGLE_API_KEY): This line creates our connection object to the Google GenAI API. We pass it the API key we retrieved earlier.

  • The loop for m in client.models.list(): iterates through all the models available to you via the API key.

  • if "embedContent" in m.supported_actions:: We're specifically interested in models that can perform the embedContent action – meaning they can convert text into numerical embeddings.

  • print(m.name): We print the names of these models. You'll likely see models like models/text-embedding-004.

Step 5: Load and Process Your PDF

We'll use the PyMuPDFLoader from langchain-community to read our PDF file. This loader is great because it automatically splits the PDF into individual pages.

from langchain_community.document_loaders import PyMuPDFLoader

# Specify the path to your PDF file in Kaggle Input (replace with your path if different!)
pdf_paths = ["/kaggle/input/pdf-of-reasoning-survey/a survey on efficient reasoning for large language models.pdf"]

# The Langchain loader expects a single path or list of paths
documents_path = pdf_paths[0] # We'll use the first path in our list

# Create a loader instance for your specific PDF file
loader = PyMuPDFLoader(documents_path)

# Load the PDF. This reads the file and splits it page by page.
pages = loader.load()

# We'll extract just the text content from each page object into a simple list
documents=[]
for i in range(len(pages)):
    documents.append(pages[i].page_content)

# Print the total number of pages/documents loaded
print(f"Loaded {len(documents)} pages from the PDF.")

Explanation:

  • from langchain_community.document_loaders import PyMuPDFLoader: Imports the tool we need.

  • pdf_paths = [...]: This is where you'd put the path(s) to your PDF file(s) on Kaggle (from Step 1).

  • loader = PyMuPDFLoader(documents_path): Creates an object ready to load your PDF.

  • pages = loader.load(): Executes the loading. pages will be a list where each item represents a page from the PDF, containing both the text content and some metadata (like page number).

  • The loop for i in range(len(pages))... documents.append(pages[i].page_content) extracts only the text content (page_content) from each page object and puts it into a new list called documents. This list of text strings is what we'll work with for embeddings.

  • print(f"Loaded {len(documents)} pages..."): Confirms how many text chunks (pages) we successfully extracted.

Step 6: Create Embeddings (Turning Text into Numbers)

Computers are great with numbers, not so much with understanding the meaning of text directly. Embeddings solve this by converting text (like a sentence or a paragraph) into a list of numbers (a vector) where the distance and direction between vectors represent the semantic similarity between the original text pieces.

The Gemini text-embedding-004 model is excellent at creating these embeddings. ChromaDB needs a specific function to generate embeddings when you add data or query. So, we'll create a custom class that wraps the Gemini embedding model in a way ChromaDB understands.

from chromadb import Documents, EmbeddingFunction, Embeddings
from google.api_core import retry
from google.genai import types

# Define a helper to retry when we hit API rate limits (helps prevent errors)
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

# Create a custom Embedding Function for ChromaDB using the Gemini model
class GeminiEmbeddingFunction(EmbeddingFunction):
    # This flag helps the model know if it's embedding a document or a query
    document_mode = True

    # This decorator automatically retries the function if a retriable error occurs
    @retry.Retry(predicate=is_retriable)
    def __call__(self, input: Documents) -> Embeddings:
        # Choose the correct task type based on whether we're embedding documents or queries
        if self.document_mode:
            embedding_task = "retrieval_document" # Use this for embedding your source text
        else:
            embedding_task = "retrieval_query"    # Use this for embedding your search question

        # Call the Gemini embedding model
        response = client.models.embed_content(
            model="models/text-embedding-004", # Specify the embedding model
            contents=input,                    # Pass the text input
            config=types.EmbedContentConfig(
                task_type=embedding_task,      # Set the task type
            ),
        )
        # Return the numerical embeddings
        return [e.values for e in response.embeddings]

Explanation:

  • from chromadb... from google.api_core...: Imports necessary components.

  • is_retriable = lambda e: ...: This is a small function that tells the @retry mechanism which kinds of errors (specifically, rate limits 429 and service unavailable 503) should trigger a retry.

  • class GeminiEmbeddingFunction(EmbeddingFunction):: We define a new class that inherits from ChromaDB's EmbeddingFunction. This makes it compatible with Chroma.

  • document_mode = True: A simple flag we add to switch the behavior between embedding source documents and embedding a search query (the Gemini embedding model performs slightly better when it knows the task).

  • @retry.Retry(...): This line is a Python "decorator" that wraps the call method. If the call method fails with an error that is_retriable identifies, the code will automatically try running it again a few times.

  • def call(self, input: Documents) -> Embeddings:: This is the core method that ChromaDB will call to get embeddings. It takes a list of text inputs (input) and is expected to return a list of numerical embeddings (Embeddings).

  • if self.document_mode: ... else: ...: Switches the task_type based on our internal flag.

  • response = client.models.embed_content(...): This is the actual call to the Google GenAI API to get the embeddings using the text-embedding-004 model.

  • return [e.values for e in response.embeddings]: Extracts the numerical vectors (values) from the API response and returns them in the format ChromaDB expects.

Step 7: Set Up and Populate ChromaDB

Now that we can generate embeddings, we'll set up our ChromaDB collection (think of it like a table or index) and add our document pages along with their newly generated embeddings.

import chromadb

# Give your database collection a name
DB_NAME = "pdf_chat_db" # Changed name for clarity

# Create an instance of our custom embedding function, set to document mode
embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True # Make sure it's in document mode for adding data

# Initialize the ChromaDB client
chroma_client = chromadb.Client()

# Get or create a collection (where our data will live) and tell it to use our embedding function
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)

# Add the document pages and their IDs to the ChromaDB collection.
# ChromaDB automatically calls our embed_fn to create and store embeddings here.
print(f"Adding {len(documents)} documents to ChromaDB...")
db.add(documents=documents, ids=[str(i) for i in range(len(documents))])
print("Done adding documents.")

# Check how many items are in the collection (should match the number of pages)
print(f"Total documents in DB: {db.count()}")

Explanation:

  • import chromadb: Imports the ChromaDB library.

  • DB_NAME = "pdf_chat_db": A simple name for our database collection.

  • embed_fn = GeminiEmbeddingFunction(): Creates an instance of the embedding function we defined.

  • embed_fn.document_mode = True: Explicitly sets the mode to True because we are embedding source documents for storage.

  • chroma_client = chromadb.Client(): Initializes ChromaDB. By default, this creates an in-memory database, which is fine for a tutorial. For persistent storage, you'd configure it differently.

  • db = chroma_client.get_or_create_collection(...): This line either connects to an existing collection named pdf_chat_db or creates it if it doesn't exist. We link our embed_fn to this collection, so ChromaDB knows how to generate embeddings for anything added to it.

  • db.add(documents=documents, ids=[str(i) for i in range(len(documents))]): This is the key step!

    • documents: The list of text strings (our PDF pages) from Step 5.

    • ids=[str(i) for i in range(len(documents))]: A list of unique IDs for each document. We're just using the page number as a string ID here (e.g., "0", "1", "2", ...).

    • When you call db.add, ChromaDB takes each document string, passes it to our embed_fn to get its embedding, and then stores both the original text and the embedding in the collection, associated with the given ID.

  • print(...): Shows the progress and confirms the count in the database.

Step 8: Ask a Question and Retrieve Relevant Passages

Now that our PDF content is embedded and searchable in ChromaDB, we can ask a question! When we ask a question, we'll:

  1. Embed the question using the same Gemini model, but in "query mode".

  2. Ask ChromaDB to find document embeddings that are numerically closest to the query embedding.

ChromaDB returns the original text corresponding to those closest embeddings. These are the most relevant parts of your PDF to your question.

     # Switch the embedding function to query mode
embed_fn.document_mode = False

# Define your question
query = "What are the different approaches discussed for efficient reasoning in LLMs?"

# Search the Chroma DB using the specified query.
# n_results=1 asks for the single most relevant document chunk (page)
result = db.query(query_texts=[query], n_results=1)

# The result object contains the text of the retrieved documents
# result["documents"] is a list of lists (one inner list per query, we only have one query)
[all_passages] = result["documents"]

# Display the first retrieved passage (the most relevant one)
print("--- Retrieved Passage ---")
print(all_passages[0])
print("-------------------------")

Explanation:

  • embed_fn.document_mode = False: Very important! We change the mode of our embedding function so that when we embed the query, the model understands it's for searching, not for storing.

  • query = "...": Your question!

  • result = db.query(query_texts=[query], n_results=1): This tells ChromaDB to perform a search.

    • query_texts=[query]: The question(s) to embed and search with.

    • n_results=1: We want the top 1 most relevant document chunk. You can increase this to get more context.

  • [all_passages] = result["documents"]: The search result is a dictionary. result["documents"] contains the original text of the retrieved document chunks. Since we only asked one query, result["documents"] is a list containing one item, which is another list of the retrieved document texts. This line unpacks that inner list into the all_passages variable. all_passages will be a list like ['Text of retrieved page 1', 'Text of retrieved page 2', ...].

  • print(all_passages[0]): Displays the text content of the very first (most relevant) page that ChromaDB found.

Step 9: Use Gemini LLM to Answer the Question (Retrieval Augmented Generation - RAG)

Retrieving relevant passages is great, but it just gives you the source text. To get a concise, direct answer, we'll use a Gemini model. We'll provide the retrieved text as context in the prompt and instruct the model to answer the question based only on that context. This technique is called Retrieval Augmented Generation (RAG).

query_oneline = query.replace("\n", " ") # Clean up the query for the prompt if it had newlines

# Construct the prompt for the Gemini model
prompt = f"""
You are a friendly and informative research assistant that helps people understand academic papers.
Your goal is to answer questions using the text from the REFERENCE PASSAGES provided.
You always break down complex topics into simple terms so non-technical readers can understand.
If the passage doesn’t help answer the question, it’s okay to say you can't find the answer in the provided text. Always aim to be accurate, clear, and easy to follow.

Here are some examples of how you should respond:

---

REFERENCE PASSAGE:
"Large Language Models (LLMs) have shown impressive abilities in complex tasks like math and programming.
But longer reasoning chains, while accurate, can lead to redundant and inefficient outputs — a problem called the 'overthinking phenomenon'.
Efficient reasoning methods aim to reduce this without losing accuracy, using strategies like reward-based training and smarter prompt design."

QUESTION:
What does the 'overthinking phenomenon' mean in LLMs?

ANSWER:
The "overthinking phenomenon" happens when language models give really long and detailed answers,
even when a shorter one would do. This wastes computing power and makes the model slower,
so researchers are now trying to make models reason more efficiently without losing accuracy.

---

REFERENCE PASSAGE:
"The survey classifies efficient reasoning into three categories: model-based methods (like using shorter training data or fine-tuning with rewards),
reasoning output methods (like compressing reasoning steps), and input prompt-based methods (like guiding the model to be concise)."

QUESTION:
What are the main strategies for improving reasoning efficiency?

ANSWER:
There are three main strategies to make language models reason more efficiently:
First, by training smaller or smarter models (model-based); second, by compressing how they think while answering (reasoning output-based);
and third, by tweaking the prompts we give them to be more to-the-point (prompt-based).

---

REFERENCE PASSAGE:
# NOTE: The actual relevant passages from your PDF will be added below this line!

QUESTION:
{query_oneline}
"""

# Add the retrieved passages from ChromaDB to the prompt
# We iterate through all_passages (even if n_results was 1, it's still a list)
for passage in all_passages:
    passage_oneline = passage.replace("\n", " ") # Clean up passage text for the prompt
    prompt += f"REFERENCE PASSAGE:\n{passage_oneline}\n\n" # Add each passage clearly labelled

# Print the complete prompt we are sending to the model (optional, but useful)
print("--- Full Prompt Sent to Gemini ---")
print(prompt)
print("---------------------------------")

# Call the Gemini model to generate the answer based on the prompt
answer = client.models.generate_content(
    model="gemini-2.0-flash", # Using a fast model like flash is often sufficient for RAG
    contents=prompt)

# Display the answer using Markdown for nice formatting
print("\n--- Generated Answer ---")
Markdown(answer.text)

Explanation:

  • query_oneline = ...: Just makes sure the query is a single line if it wasn't already.

  • prompt = f"""...""": This defines a multi-line string using an f-string, allowing us to easily insert variables.

    • Role/Instructions: The beginning sets the stage, telling the model how to behave (friendly, helpful, use provided text, simplify).

    • Few-Shot Examples: The sections marked --- and containing REFERENCE PASSAGE:, QUESTION:, and ANSWER: are "few-shot" examples. We provide these to show the model the desired input/output format and style. This helps it understand the task better. The examples here are hardcoded and are similar in topic to the paper, but their content doesn't come directly from the PDF loaded in Step 5. They serve as format guides.

    • QUESTION: {query_oneline}: Inserts the actual question we want answered.

    • The loop for passage in all_passages... prompt += ...: This is where we dynamically add the actual text passages that ChromaDB retrieved in Step 8 to the prompt. We label them clearly as REFERENCE PASSAGE: so the model knows this is the context to use.

  • print(prompt): Shows you the final text that gets sent to the Gemini model.

  • answer = client.models.generate_content(...): This calls the Gemini API.

    • model="gemini-2.0-flash": We specify which model to use. gemini-2.0-flash is a good balance of speed, cost, and capability for RAG tasks.

    • contents=prompt: We pass the entire constructed prompt to the model.

  • Markdown(answer.text): Displays the model's response. The Markdown function in Kaggle/Jupyter notebooks makes the output look nicer if the model uses Markdown formatting (like bullet points).

Putting It All Together

You've successfully built a system that:

  1. Loads your PDF document.

  2. Breaks it down into smaller pieces (pages).

  3. Converts the text of those pieces into numerical representations (embeddings) using Google Gemini.

  4. Stores these embeddings and the original text in a searchable database (ChromaDB).

  5. Takes your question, converts it into an embedding, and searches the database for the most relevant text chunks.

  6. Sends those relevant text chunks, along with your question and instructions, to a Google Gemini model.

  7. Gets an answer from the Gemini model that is grounded only in the text from your original PDF.

You've created a simple but powerful Q&A system for your documents using RAG!

Wrapping Up

This tutorial showed you the core steps for building a document Q&A system on Kaggle using Google Gemini and ChromaDB. You can expand on this by:

  • Experimenting with n_results in the ChromaDB query to provide more or less context to the LLM.

  • Trying different PDF documents.

  • Using different prompt techniques to handle different types of questions or desired answer formats.

  • Exploring other embedding models or vector databases.

Hopefully, this makes interacting with your documents a little easier and shows you the power of combining embeddings, vector databases, and large language models!

Happy coding!

0
Subscribe to my newsletter

Read articles from nataliyah ahmad directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

nataliyah ahmad
nataliyah ahmad