In my work with coding agents, I’ve run into a recurring frustration - I kept doing the same jobs over and over.

Checking that we use dependency injection instead of ad-hoc mocks.
Ensuring we write tests (and don’t weaken or remove them).
Spotting code duplication.
Enforcing the same quality rules that matter to me.

These aren’t glamorous tasks, but they’re critical. And they felt like the kind of things I should be able to delegate. That became the seed for my multi-agent workflow — a system where multiple agents, each representing a perspective I care about, collaborate to analyze features, design solutions, and create implementation plans.

The Agent Team

Instead of a single coding agent, I spun up four agents with distinct roles:

Fast Iterating Developer

Goal: Ship small, working increments quickly; prefer straightforward paths to progress. Biases: Rapid feedback over polish; accepts tactical debt if it’s explicitly documented and ticketed. Responsibilities: Propose the minimal viable changes, outline a short iteration plan, flag any corners cut. Quality guardrails: Must honor existing interfaces, dependency injection boundaries, and pass current tests before proposing new ones. Link: Fast Iterating Dev prompt

Test-Conscious Developer

Goal: Maintain and improve test rigor without slowing development to a crawl. Biases: Favors integration tests for behavioral guarantees; resists weakening or deleting tests without justification. Responsibilities: Identify risk areas, propose test additions (units/integration), define acceptance checks and regression traps. Quality guardrails: Avoids mocking concrete implementations when DI makes seams available; documents coverage deltas. Link: Test-Conscious Dev Prompt

Senior Engineer

Goal: Keep code simple and expressive; enforce consistency and thoughtful application of patterns. Biases: Favors clarity over cleverness; removes accidental complexity; pushes for consistent naming, module boundaries, and API shape. Responsibilities: Refactor proposals for readability/maintainability, highlight duplication, suggest idiomatic patterns already used in the repo. Quality guardrails: Upholds dependency injection over ad-hoc mocks; aligns proposals with existing conventions captured in the codebase analysis. Link: Senior Engineer prompt

Architect

Goal: Preserve a coherent, scalable architecture—avoid the “bag of parts.” Biases: Extends the system minimally to fit new needs; discourages “just one more component” proliferation. Responsibilities: Validate boundaries, data flow, ownership, and failure modes; ensure new design fits long-term direction. Quality guardrails: Explicitly calls out coupling risks, migration/compatibility concerns, and cross-cutting concerns (observability, auth, config). Link: Architect prompt

Each one encodes a mindset I normally apply (or remind others to apply) during reviews — with particular emphasis on the aspects that are important to me, like dependency injection or avoiding test shortcuts.

Orchestrating the Workflow

The workflow is structured, but produces a surprisingly rich set of artifacts. Here’s how it works:

Codebase Analysis (Step 0) The Senior Engineer analyzes the codebase and produces a document that describes the directory structure, coding patterns, and conventions. This ensures every agent begins from the same shared context.

Analysis (Step 1) Each agent produces an analysis document. This results in: task_specification.md (feature description) task_metadata.json (if derived from a PRD) codebase_analysis.md (repository overview from Step 0) One analysis document per agent (developer, tester, senior, architect) context_pr.json (workflow state)

Design Consolidation (Step 2) The agents review each other’s analyses, highlight conflicts, and generate: consolidated_design.md (unified design) conflict_resolution.md (how disagreements were settled)

Design Finalization (Step 3) I review the consolidated design in a GitHub PR. My feedback is ingested and used to generate: finalized_design.md (production-ready design) feedback_incorporation_summary.md (what changed and why)

Ready for Development (Step 4) A bundle of documents (final design, codebase analysis, task spec, workflow context) becomes the blueprint for implementation.

Here’s the document flow diagram:

graph TB

    %% Input Sources
    subgraph Input Sources
        PRD["PRD File\nproduct_requirements.md"]
        TaskFile["Task File\ntask.md"]
        Feature["Feature Name\nexample User Authentication"]
    end

    %% Step 1: Analysis
    subgraph Step 1 Analysis
        ExtractFeature["Senior Engineer\nExtracts Feature"]
        CreateDocs1["Generate Documents"]

        subgraph Step 1 Outputs
            TaskSpec["task_specification.md\nExtracted or Original Feature"]
            TaskMeta["task_metadata.json\nPRD source info"]
            CodebaseAnalysis["codebase_analysis.md\nRepository patterns"]

            subgraph Round 1 Analysis
                ArchAnalysis["architect_analysis.md"]
                DevAnalysis["developer_analysis.md"]
                SeniorAnalysis["senior_engineer_analysis.md"]
                TesterAnalysis["tester_analysis.md"]
            end

            Context1["context_pr_XXX.json\nWorkflow state"]
        end
    end

    %% GitHub PR 1
    subgraph GitHub PR After Step 1
        PR1["Pull Request #XXX\nContains all analysis docs"]
        Comments1["Your Comments\non Analysis Docs"]
    end

    %% Step 2: Design Consolidation
    subgraph Step 2 Design Consolidation
        LoadContext2["Load Context\nand Analysis"]
        FetchComments2["Fetch PR Comments"]
        Consolidate["Agents Review\nEach Others Work"]
        ResolveConflicts["Resolve Conflicts"]

        subgraph Step 2 Outputs
            ConsolidatedDesign["consolidated_design.md\nUnified design document"]
            ConflictResolution["conflict_resolution.md\nHow conflicts were resolved"]
            Context2["Updated context_pr_XXX.json"]
        end
    end

    %% GitHub PR 2
    subgraph GitHub PR After Step 2
        PR2["Same PR #XXX\nNow has design doc"]
        Comments2["Your Comments\non Design Document"]
    end

    %% Step 3: Finalize Design
    subgraph Step 3 Finalize Design
        LoadContext3["Load Context\nand Design"]
        FetchComments3["Fetch PR Comments\non Design"]
        CategorizeFeeback["Categorize Feedback"]
        UpdateDesign["Update Design\nBased on Feedback"]

        subgraph Step 3 Outputs
            FinalDesign["finalized_design.md\nProduction ready design"]
            FeedbackSummary["feedback_incorporation_summary.md\nWhat changed and why"]
            Context3["Final context_pr_XXX.json"]
        end
    end

    %% Final Documents for Development
    subgraph Ready for Development
        FinalDocs["Documents for Step 4:\n- finalized_design.md\n- codebase_analysis.md\n- task_specification.md\n- context_pr_XXX.json"]
    end

    %% Connections - Input Flow
    PRD --> |--prd-file| ExtractFeature
    Feature --> |--feature| ExtractFeature
    TaskFile --> |Direct input| CreateDocs1
    ExtractFeature --> TaskSpec
    ExtractFeature --> TaskMeta

    %% Step 1 Flow
    TaskSpec --> CreateDocs1
    CreateDocs1 --> CodebaseAnalysis
    CreateDocs1 --> ArchAnalysis
    CreateDocs1 --> DevAnalysis
    CreateDocs1 --> SeniorAnalysis
    CreateDocs1 --> TesterAnalysis
    CreateDocs1 --> Context1

    %% Step 1 to GitHub
    ArchAnalysis --> PR1
    DevAnalysis --> PR1
    SeniorAnalysis --> PR1
    TesterAnalysis --> PR1
    CodebaseAnalysis --> PR1

    %% User Feedback 1
    PR1 --> Comments1
    Comments1 --> |Feedback on analysis| FetchComments2

    %% Step 2 Flow
    Context1 --> |--pr XXX| LoadContext2
    LoadContext2 --> Consolidate
    FetchComments2 --> Consolidate
    Consolidate --> ResolveConflicts
    ResolveConflicts --> ConsolidatedDesign
    ResolveConflicts --> ConflictResolution
    ConsolidatedDesign --> Context2

    %% Step 2 to GitHub
    ConsolidatedDesign --> PR2
    ConflictResolution --> PR2

    %% User Feedback 2
    PR2 --> Comments2
    Comments2 --> |Feedback on design| FetchComments3

    %% Step 3 Flow
    Context2 --> |--pr XXX| LoadContext3
    ConsolidatedDesign --> LoadContext3
    LoadContext3 --> CategorizeFeeback
    FetchComments3 --> CategorizeFeeback
    CategorizeFeeback --> UpdateDesign
    UpdateDesign --> FinalDesign
    UpdateDesign --> FeedbackSummary
    FinalDesign --> Context3

    %% Final Output
    FinalDesign --> FinalDocs
    CodebaseAnalysis --> FinalDocs
    TaskSpec --> FinalDocs
    Context3 --> FinalDocs

    %% Styling
    classDef input fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef process fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef output fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef github fill:#c8e6c9,stroke:#1b5e20,stroke-width:2px
    classDef final fill:#ffebee,stroke:#b71c1c,stroke-width:3px

    class PRD,TaskFile,Feature input
    class ExtractFeature,CreateDocs1,LoadContext2,FetchComments2,Consolidate,ResolveConflicts,LoadContext3,FetchComments3,CategorizeFeeback,UpdateDesign process
    class TaskSpec,TaskMeta,CodebaseAnalysis,ArchAnalysis,DevAnalysis,SeniorAnalysis,TesterAnalysis,Context1,ConsolidatedDesign,ConflictResolution,Context2,FinalDesign,FeedbackSummary,Context3 output
    class PR1,Comments1,PR2,Comments2 github
    class FinalDocs final

The richness of this process surprised me — by the time development starts, I already have a codebase analysis, per-agent perspectives, a consolidated design, conflict notes, a feedback summary, and a finalized design doc.

The Glue: MCP + PR Feedback

This workflow wouldn’t work without my MCP server. It

reads PR comments, processes them, and feeds them back to the agents.
posts replies directly into GitHub PRs.
runs custom tools (e.g., parsing build logs, surfacing lint errors, or reading test outputs).
manages state and context across steps.

Essentially, it acts as the orchestration backbone — ensuring agents don’t just generate documents in isolation, but participate in a continuous, feedback-driven process.

The Collaboration Problem

One challenge I haven’t solved is how to create a truly collaborative environment. In a real-world team, discussions are messy: people jump in with half-baked ideas, questions, and counterpoints in no particular order. I couldn't figure out how to do this with Agents. Instead, my workflow is serialized: one agent at a time, in a fixed order. That comes with trade-offs:

Bias: the first agent to write sets the tone for everyone else. Stopping criteria: when is enough iteration enough?

This serialization works, but it feels more rigid than human collaboration.

Context Engineering: The Real Challenge

Another hard problem was context engineering (see https://www.promptingguide.ai/guides/context-engineering-guide). The key wasn’t just writing prompts — it was carefully deciding what information each agent should see, when.

Some of the pitfalls I ran into:

Agents generating giant design docs with entire chunks of code embedded.
Agents producing inconsistent or even empty documents.

The coding agent that I used to build this workflow kept embedding concrete details (like class names or script names) directly into prompts — I had to keep reminding it: extract details from the repo, don’t hardcode them, keep it generic.

And one more practical problem: Sourcegraph AMP lets agents persist, but their local state is tied to the filesystem. I had to figure out how to:

Start each agent in its own scratch location. Switch them into the actual codebase directory without losing context.

Sounds simple — but too often, the response I’d get was: “I can’t find any code.” Solving that required a surprising amount of trial and error.

A Concrete Example

To test the workflow, I used a relatively simple feature: store in a database whether I already replied to a PR comment (to avoid duplicate replies).

Here’s how the agents responded:

Fast Developer: proposed a quick schema and plan. Test-Conscious Developer: raised consistency concerns and suggested integration tests. Senior Engineer: simplified the schema, reduced unnecessary complexity, and enforced consistency. Architect: confirmed the new table aligned with the broader DB model.

The final design was more balanced and higher quality than any single agent’s output — and I could trace its evolution across the analysis, consolidation, and finalization stages.

Artifacts

Some of the most useful artifacts that emerge from this workflow:

Agent Prompts — charters defining each persona.
Workflow Diagram — the document flow in Mermaid.
Example Outputs — individual analysis vs. consolidated design.

All the code for this workflow is here: 👉 github.com/MarksStuff/github-agent/tree/main/multi_agent_workflow

Reflections

This multi-agent workflow is still experimental, but I’ve learned a few things:

Agents focused on aspects you care about (dependency injection, testing discipline, scalability) can save you from repeating the same checks.
Human in the loop is critical — PR comments act as arbitration between agents.
Serialized workflows are a compromise — structured, but biased by order.
Context engineering is the real bottleneck — harder and more fragile than prompt writing.
State management is tricky — keeping agents “alive” while pointing them at the real codebase isn’t trivial.

Closing

I don’t think the future of AI in coding is “one agent that does everything.” It’s multi-agent workflows: diverse perspectives, structured collaboration, and better outcomes than any single agent can produce - similar how we humans build software.

This workflow is my first attempt at that. It’s messy, imperfect, but it’s working.

👉 I’d love to hear: if you built your own multi-agent workflow, what roles did you use? And could you figure out a truly collaborative workflow?

Building a Multi-Agent Workflow for Design and Coding