Building a Multi-Agent Workflow for Design and Coding


In my work with coding agents, I’ve run into a recurring frustration - I kept doing the same jobs over and over.
- Checking that we use dependency injection instead of ad-hoc mocks.
- Ensuring we write tests (and don’t weaken or remove them).
- Spotting code duplication.
- Enforcing the same quality rules that matter to me.
These aren’t glamorous tasks, but they’re critical. And they felt like the kind of things I should be able to delegate. That became the seed for my multi-agent workflow — a system where multiple agents, each representing a perspective I care about, collaborate to analyze features, design solutions, and create implementation plans.
The Agent Team
Instead of a single coding agent, I spun up four agents with distinct roles:
Fast Iterating Developer
Goal: Ship small, working increments quickly; prefer straightforward paths to progress. Biases: Rapid feedback over polish; accepts tactical debt if it’s explicitly documented and ticketed. Responsibilities: Propose the minimal viable changes, outline a short iteration plan, flag any corners cut. Quality guardrails: Must honor existing interfaces, dependency injection boundaries, and pass current tests before proposing new ones. Link: Fast Iterating Dev prompt
Test-Conscious Developer
Goal: Maintain and improve test rigor without slowing development to a crawl. Biases: Favors integration tests for behavioral guarantees; resists weakening or deleting tests without justification. Responsibilities: Identify risk areas, propose test additions (units/integration), define acceptance checks and regression traps. Quality guardrails: Avoids mocking concrete implementations when DI makes seams available; documents coverage deltas. Link: Test-Conscious Dev Prompt
Senior Engineer
Goal: Keep code simple and expressive; enforce consistency and thoughtful application of patterns. Biases: Favors clarity over cleverness; removes accidental complexity; pushes for consistent naming, module boundaries, and API shape. Responsibilities: Refactor proposals for readability/maintainability, highlight duplication, suggest idiomatic patterns already used in the repo. Quality guardrails: Upholds dependency injection over ad-hoc mocks; aligns proposals with existing conventions captured in the codebase analysis. Link: Senior Engineer prompt
Architect
Goal: Preserve a coherent, scalable architecture—avoid the “bag of parts.” Biases: Extends the system minimally to fit new needs; discourages “just one more component” proliferation. Responsibilities: Validate boundaries, data flow, ownership, and failure modes; ensure new design fits long-term direction. Quality guardrails: Explicitly calls out coupling risks, migration/compatibility concerns, and cross-cutting concerns (observability, auth, config). Link: Architect prompt
Each one encodes a mindset I normally apply (or remind others to apply) during reviews — with particular emphasis on the aspects that are important to me, like dependency injection or avoiding test shortcuts.
Orchestrating the Workflow
The workflow is structured, but produces a surprisingly rich set of artifacts. Here’s how it works:
Codebase Analysis (Step 0) The Senior Engineer analyzes the codebase and produces a document that describes the directory structure, coding patterns, and conventions. This ensures every agent begins from the same shared context.
Analysis (Step 1) Each agent produces an analysis document. This results in:
task_specification.md
(feature description)
task_metadata.json
(if derived from a PRD)
codebase_analysis.md
(repository overview from Step 0)
One analysis document per agent (developer, tester, senior, architect)
context_pr.json
(workflow state)
Design Consolidation (Step 2) The agents review each other’s analyses, highlight conflicts, and generate:
consolidated_design.md
(unified design)
conflict_resolution.md
(how disagreements were settled)
Design Finalization (Step 3) I review the consolidated design in a GitHub PR. My feedback is ingested and used to generate:
finalized_design.md
(production-ready design)
feedback_incorporation_summary.md
(what changed and why)
Ready for Development (Step 4) A bundle of documents (final design, codebase analysis, task spec, workflow context) becomes the blueprint for implementation.
Here’s the document flow diagram:
graph TB
%% Input Sources
subgraph Input Sources
PRD["PRD File\nproduct_requirements.md"]
TaskFile["Task File\ntask.md"]
Feature["Feature Name\nexample User Authentication"]
end
%% Step 1: Analysis
subgraph Step 1 Analysis
ExtractFeature["Senior Engineer\nExtracts Feature"]
CreateDocs1["Generate Documents"]
subgraph Step 1 Outputs
TaskSpec["task_specification.md\nExtracted or Original Feature"]
TaskMeta["task_metadata.json\nPRD source info"]
CodebaseAnalysis["codebase_analysis.md\nRepository patterns"]
subgraph Round 1 Analysis
ArchAnalysis["architect_analysis.md"]
DevAnalysis["developer_analysis.md"]
SeniorAnalysis["senior_engineer_analysis.md"]
TesterAnalysis["tester_analysis.md"]
end
Context1["context_pr_XXX.json\nWorkflow state"]
end
end
%% GitHub PR 1
subgraph GitHub PR After Step 1
PR1["Pull Request #XXX\nContains all analysis docs"]
Comments1["Your Comments\non Analysis Docs"]
end
%% Step 2: Design Consolidation
subgraph Step 2 Design Consolidation
LoadContext2["Load Context\nand Analysis"]
FetchComments2["Fetch PR Comments"]
Consolidate["Agents Review\nEach Others Work"]
ResolveConflicts["Resolve Conflicts"]
subgraph Step 2 Outputs
ConsolidatedDesign["consolidated_design.md\nUnified design document"]
ConflictResolution["conflict_resolution.md\nHow conflicts were resolved"]
Context2["Updated context_pr_XXX.json"]
end
end
%% GitHub PR 2
subgraph GitHub PR After Step 2
PR2["Same PR #XXX\nNow has design doc"]
Comments2["Your Comments\non Design Document"]
end
%% Step 3: Finalize Design
subgraph Step 3 Finalize Design
LoadContext3["Load Context\nand Design"]
FetchComments3["Fetch PR Comments\non Design"]
CategorizeFeeback["Categorize Feedback"]
UpdateDesign["Update Design\nBased on Feedback"]
subgraph Step 3 Outputs
FinalDesign["finalized_design.md\nProduction ready design"]
FeedbackSummary["feedback_incorporation_summary.md\nWhat changed and why"]
Context3["Final context_pr_XXX.json"]
end
end
%% Final Documents for Development
subgraph Ready for Development
FinalDocs["Documents for Step 4:\n- finalized_design.md\n- codebase_analysis.md\n- task_specification.md\n- context_pr_XXX.json"]
end
%% Connections - Input Flow
PRD --> |--prd-file| ExtractFeature
Feature --> |--feature| ExtractFeature
TaskFile --> |Direct input| CreateDocs1
ExtractFeature --> TaskSpec
ExtractFeature --> TaskMeta
%% Step 1 Flow
TaskSpec --> CreateDocs1
CreateDocs1 --> CodebaseAnalysis
CreateDocs1 --> ArchAnalysis
CreateDocs1 --> DevAnalysis
CreateDocs1 --> SeniorAnalysis
CreateDocs1 --> TesterAnalysis
CreateDocs1 --> Context1
%% Step 1 to GitHub
ArchAnalysis --> PR1
DevAnalysis --> PR1
SeniorAnalysis --> PR1
TesterAnalysis --> PR1
CodebaseAnalysis --> PR1
%% User Feedback 1
PR1 --> Comments1
Comments1 --> |Feedback on analysis| FetchComments2
%% Step 2 Flow
Context1 --> |--pr XXX| LoadContext2
LoadContext2 --> Consolidate
FetchComments2 --> Consolidate
Consolidate --> ResolveConflicts
ResolveConflicts --> ConsolidatedDesign
ResolveConflicts --> ConflictResolution
ConsolidatedDesign --> Context2
%% Step 2 to GitHub
ConsolidatedDesign --> PR2
ConflictResolution --> PR2
%% User Feedback 2
PR2 --> Comments2
Comments2 --> |Feedback on design| FetchComments3
%% Step 3 Flow
Context2 --> |--pr XXX| LoadContext3
ConsolidatedDesign --> LoadContext3
LoadContext3 --> CategorizeFeeback
FetchComments3 --> CategorizeFeeback
CategorizeFeeback --> UpdateDesign
UpdateDesign --> FinalDesign
UpdateDesign --> FeedbackSummary
FinalDesign --> Context3
%% Final Output
FinalDesign --> FinalDocs
CodebaseAnalysis --> FinalDocs
TaskSpec --> FinalDocs
Context3 --> FinalDocs
%% Styling
classDef input fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef process fill:#fff3e0,stroke:#e65100,stroke-width:2px
classDef output fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef github fill:#c8e6c9,stroke:#1b5e20,stroke-width:2px
classDef final fill:#ffebee,stroke:#b71c1c,stroke-width:3px
class PRD,TaskFile,Feature input
class ExtractFeature,CreateDocs1,LoadContext2,FetchComments2,Consolidate,ResolveConflicts,LoadContext3,FetchComments3,CategorizeFeeback,UpdateDesign process
class TaskSpec,TaskMeta,CodebaseAnalysis,ArchAnalysis,DevAnalysis,SeniorAnalysis,TesterAnalysis,Context1,ConsolidatedDesign,ConflictResolution,Context2,FinalDesign,FeedbackSummary,Context3 output
class PR1,Comments1,PR2,Comments2 github
class FinalDocs final
The richness of this process surprised me — by the time development starts, I already have a codebase analysis, per-agent perspectives, a consolidated design, conflict notes, a feedback summary, and a finalized design doc.
The Glue: MCP + PR Feedback
This workflow wouldn’t work without my MCP server. It
- reads PR comments, processes them, and feeds them back to the agents.
- posts replies directly into GitHub PRs.
- runs custom tools (e.g., parsing build logs, surfacing lint errors, or reading test outputs).
- manages state and context across steps.
Essentially, it acts as the orchestration backbone — ensuring agents don’t just generate documents in isolation, but participate in a continuous, feedback-driven process.
The Collaboration Problem
One challenge I haven’t solved is how to create a truly collaborative environment. In a real-world team, discussions are messy: people jump in with half-baked ideas, questions, and counterpoints in no particular order. I couldn't figure out how to do this with Agents. Instead, my workflow is serialized: one agent at a time, in a fixed order. That comes with trade-offs:
Bias: the first agent to write sets the tone for everyone else. Stopping criteria: when is enough iteration enough?
This serialization works, but it feels more rigid than human collaboration.
Context Engineering: The Real Challenge
Another hard problem was context engineering (see https://www.promptingguide.ai/guides/context-engineering-guide). The key wasn’t just writing prompts — it was carefully deciding what information each agent should see, when.
Some of the pitfalls I ran into:
- Agents generating giant design docs with entire chunks of code embedded.
- Agents producing inconsistent or even empty documents.
The coding agent that I used to build this workflow kept embedding concrete details (like class names or script names) directly into prompts — I had to keep reminding it: extract details from the repo, don’t hardcode them, keep it generic.
And one more practical problem: Sourcegraph AMP lets agents persist, but their local state is tied to the filesystem. I had to figure out how to:
Start each agent in its own scratch location. Switch them into the actual codebase directory without losing context.
Sounds simple — but too often, the response I’d get was: “I can’t find any code.” Solving that required a surprising amount of trial and error.
A Concrete Example
To test the workflow, I used a relatively simple feature: store in a database whether I already replied to a PR comment (to avoid duplicate replies).
Here’s how the agents responded:
Fast Developer: proposed a quick schema and plan. Test-Conscious Developer: raised consistency concerns and suggested integration tests. Senior Engineer: simplified the schema, reduced unnecessary complexity, and enforced consistency. Architect: confirmed the new table aligned with the broader DB model.
The final design was more balanced and higher quality than any single agent’s output — and I could trace its evolution across the analysis, consolidation, and finalization stages.
Artifacts
Some of the most useful artifacts that emerge from this workflow:
- Agent Prompts — charters defining each persona.
- Workflow Diagram — the document flow in Mermaid.
- Example Outputs — individual analysis vs. consolidated design.
All the code for this workflow is here: 👉 github.com/MarksStuff/github-agent/tree/main/multi_agent_workflow
Reflections
This multi-agent workflow is still experimental, but I’ve learned a few things:
- Agents focused on aspects you care about (dependency injection, testing discipline, scalability) can save you from repeating the same checks.
- Human in the loop is critical — PR comments act as arbitration between agents.
- Serialized workflows are a compromise — structured, but biased by order.
- Context engineering is the real bottleneck — harder and more fragile than prompt writing.
- State management is tricky — keeping agents “alive” while pointing them at the real codebase isn’t trivial.
Closing
I don’t think the future of AI in coding is “one agent that does everything.” It’s multi-agent workflows: diverse perspectives, structured collaboration, and better outcomes than any single agent can produce - similar how we humans build software.
This workflow is my first attempt at that. It’s messy, imperfect, but it’s working.
👉 I’d love to hear: if you built your own multi-agent workflow, what roles did you use? And could you figure out a truly collaborative workflow?
Subscribe to my newsletter
Read articles from Mark Striebeck directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
