Debating Agents

Smaranjit GhoseSmaranjit Ghose
6 min read

🌟 Introduction

In 2025, Agentic AI has revolutionized how we interact with large language models (LLMs). Since ChatGPT's debut in November 2022, LLMs have evolved from simple response generators to sophisticated agents capable of planning, reasoning, acting, and reflecting. Yet, despite these advancements, certain challenges persist—particularly in domains like mathematics, where LLMs often struggle with precision.

To address this, many systems integrate external tools like symbolic math engines, Python interpreters, or hosted MCP servers. While effective, these solutions add complexity. What if we could leverage the LLMs themselves to improve accuracy? Enter Debating Agents, a lightweight technique where two AI models engage in a structured debate to refine their answers and converge on a solution. In this blog, we’ll explore how debating agents work, why they’re valuable, and how you can build your own using OpenAI’s GPT and Google’s Gemini, with a complete Python implementation.

🔬 How Do Debating Agents Work?

Debating agents operate on a simple yet powerful protocol:

  1. Ask Both Agents the Same Question: Two LLMs (e.g., GPT-o4-mini and Gemini-1.5-flash) are prompted with the same question, requiring an integer answer and an explanation.

  2. Compare Answers: If the agents agree, the process stops. If not, the debate begins.

  3. Exchange Explanations: Each agent receives the other’s answer and explanation, along with a prompt to reconsider its position.

  4. Repeat Until Agreement (or Timeout): The debate continues for up to five rounds, stopping if the agents converge or when the maximum rounds are reached.

This approach mimics human debate, where differing perspectives lead to refined reasoning and better outcomes

🏆 Why Debating Agents Matter

🔄 Traditional LLM Limitations:

  • Non-deterministic Outputs: LLMs can produce inconsistent answers to the same question.

  • Limited Self-Correction: Without external feedback, models may stick to incorrect reasoning.

  • Lack of Verification: A single model lacks a second perspective to challenge its assumptions.

Debating Agent Advantages:

  • Multi-Perspective Reasoning: Two models cross-check each other, exposing flaws in logic.

  • Built-in Self-Correction: Agents refine their answers based on the other’s reasoning.

  • Richer Explanations: Debates produce detailed, nuanced explanations.

  • Robustness for Complex Questions: Disagreements often lead to deeper exploration of ambiguous problems.

💼 Practical Use Cases

Debating agents shine in scenarios requiring precision or critical analysis:

  • 🧮 Math and Logic Problems: Solving equations, derivatives, integrals, or logical proofs.

  • 💡 Scientific Analysis: Comparing hypotheses or critiquing experimental designs.

  • 🔍 Fact Verification: Validating claims in news, history, or legal reasoning.

  • 📐 Programming Assistance: Comparing algorithm efficiency or debugging strategies.

🛠️ Building Your Own Debating Agents

Let’s dive into a Python implementation that pits GPT-4o-mini against Gemini-1.5-flash in a structured debate. The code is designed to be reusable, secure, and robust, using Pydantic for response validation and environment variables for API key management.

Step 1: Initialize a new project

uv init debating-agents

Step 2: Install dependencies

  1. Navigate inside the directory

     cd debating-agents
    
  2. Get uv to create a virtual environment where all the required packages will be installed

cd debating-agents
uv add openai google-genai pydantic python-dotenv

Step 3: Configure environment variables

  1. Get the API keys from [OpenAI](https://platform.openai.com) and [Google](https://ai.google.dev/gemini-api/docs/api-key)

  1. Create an `.env` file inside the project directory
touch .env
  1. Add the keys in the `.env` file
OPENAI_API_KEY=your_openai_key
GEMINI_API_KEY=your_gemini_key

Step 4: Basic Setup

  • Clear the placeholder code in the main.py file. In some versions of uv it is the hello.py file

  • Import the modules

import os
from openai import OpenAI
from google import genai
from dotenv import load_dotenv
from pydantic import BaseModel, field_validator
  • Read the API Keys and initialize the respective API clients
# Load environment variables
load_dotenv()

# Initialize API clients
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
gemini_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
  • Set a value for the max number of rounds allowed
MAX_ROUNDS = 5

If N is the max_rounds, then the models will debate for n-1 rounds, since in the first round both the llms would independently generate an answer

Step 5: Define the Response Data Model

Use Pydantic to create a data model for structuring and validating agent responses.

class AgentResponse(BaseModel):
    answer: int 
    explanation: str 

    @field_validator("answer", mode="before")
    def parse_answer(cls, v):
        if isinstance(v, int):
            return v
        try:
            return int(''.join(filter(str.isdigit, v.strip().split()[0])))
        except Exception:
            raise ValueError("Invalid integer format for answer.")

Step 6: Create Functions to Query AI Agents

def ask_gpt(prompt: str) -> AgentResponse:
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    content = response.choices[0].message.content.strip()
    return parse_structured_response(content)

def ask_gemini(prompt: str) -> AgentResponse:
    response = gemini_client.models.generate_content(
        model="gemini-1.5-flash",
        contents=prompt
    )
    content = response.text.strip()
    return parse_structured_response(content)

Step 7: Parse API Responses

def parse_structured_response(response: str) -> AgentResponse:
    answer = None
    explanation = []
    for line in response.splitlines():
        if line.lower().startswith("answer:"):
            answer = line.split(":", 1)[-1].strip()
        elif line.lower().startswith("explanation:"):
            explanation.append(line.split(":", 1)[-1].strip())
        else:
            explanation.append(line.strip())
    if answer is None:
        raise ValueError("Answer not found in response.")
    return AgentResponse(answer=answer, explanation=' '.join(explanation))

Step 8: Generate Reconsideration Prompts

Creates prompts for agents to reconsider their answers based on the other’s response. Instruct the agent to respond in a structured format (Answer: <integer>, Explanation: <text>).

def create_reconsideration_prompt(your_response: AgentResponse, peer_response: AgentResponse) -> str:
    return f"""
You previously answered:
Answer: {your_response.answer}
Explanation: {your_response.explanation}

Another agent answered:
Answer: {peer_response.answer}
Explanation: {peer_response.explanation}

Would you like to reconsider your answer or explanation?

Please respond in this exact format:
Answer: <your new or reaffirmed integer answer>
Explanation: <your explanation or updated reasoning>
"""

Step 9: Craft the Debate Logic

Orchestrates the debate, coordinating rounds and checking for consensus.

def debate_agents(question: str) -> None:
    print("\n--- Round 1 ---")
    gpt_response = ask_gpt(f"Answer the following question. Please respond in the format:\nAnswer: <integer>\nExplanation: <your explanation>\n\nQuestion: {question}")
    gemini_response = ask_gemini(f"Answer the following question. Please respond in the format:\nAnswer: <integer>\nExplanation: <your explanation>\n\nQuestion: {question}")

    print(f"GPT Answer: {gpt_response.answer}")
    print(f"Gemini Answer: {gemini_response.answer}")

    if gpt_response.answer == gemini_response.answer:
        print("\n✅ Consensus Reached on Round 1!")
        return

    for round in range(2, MAX_ROUNDS + 1):
        print(f"\n--- Round {round} ---")
        gpt_prompt = create_reconsideration_prompt(gpt_response, gemini_response)
        gemini_prompt = create_reconsideration_prompt(gemini_response, gpt_response)

        gpt_response = ask_gpt(gpt_prompt)
        gemini_response = ask_gemini(gemini_prompt)

        print(f"GPT Answer: {gpt_response.answer}")
        print(f"Gemini Answer: {gemini_response.answer}")

        if gpt_response.answer == gemini_response.answer:
            print("\n✅ Consensus Reached!")
            return

    print("\n❌ Max rounds reached. No consensus.")
    print(f"\nGPT Final Answer: {gpt_response.answer}\nExplanation: {gpt_response.explanation}")
    print(f"\nGemini Final Answer: {gemini_response.answer}\nExplanation: {gemini_response.explanation}")
  • Round 1: Prompts both agents with the initial question and checks if their answers match.

  • Subsequent Rounds: If no consensus, generates reconsideration prompts, queries both agents, and checks again.

  • Termination: Stops if the agents agree or after MAX_ROUNDS (5). If no consensus, displays both final answers and explanations.

Step 10: Start the Debate

if __name__ == "__main__":
    question_input = input("🗣️ Enter a question: ")
    debate_agents(question_input)

Step 11: Try out the debating agents!

uv run debating_agents.py

🧪 Example:

🗣️ Enter a question: How many integers between 1 and 1000 are divisble by 3,5 or 7 but not 3 and 5?

--- Round 1 ---
GPT Answer: 477
Gemini Answer: 333

--- Round 2 ---
GPT Answer: 477
Gemini Answer: 477

✅ Consensus Reached!

📋 Closing Notes

Debating Agents offer a compelling approach to harnessing the collective reasoning power of large language models, transforming disagreement into a catalyst for precision and insight. By orchestrating a structured dialogue between models like GPT-4o-mini and Gemini-1.5-flash, this design pattern not only mitigates the limitations of individual LLMs but also unlocks richer, more reliable outcomes. From tackling mathematical conundrums to verifying facts or optimizing code, Debating Agents provide a versatile, lightweight solution for scenarios where accuracy is paramount.

As Agentic AI continues to evolve in 2025, integrating collaborative reasoning patterns like Debating Agents into your projects can set your applications apart. The Python implementation provided here is just the beginning—experiment with additional models, adapt the system for open-ended questions, or build a real-time interface to visualize the debate. The possibilities are vast, and the potential to enhance AI-driven decision-making is within your reach.

10
Subscribe to my newsletter

Read articles from Smaranjit Ghose directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Smaranjit Ghose
Smaranjit Ghose

Talks about artificial intelligence, building SaaS solutions, product management, personal finance, freelancing, business, system design, programming and tech career tips