How I Used AI to Simulate Multi-Agent Systems for Real DevOps Workflows

Leena MalhotraLeena Malhotra
4 min read

DevOps is already complex.
Multi-service. Multi-environment. Multi-stakeholder.

So why are most AI tools built for single-turn interactions?

You ask a question.
It gives an answer.
Then you start over.

That’s not how real systems work.
That’s not how real teams work.
And it’s definitely not how DevOps works.

So I asked: What if I could simulate an actual DevOps scenario—across roles, tools, and decisions—using AI agents working together?

That experiment led me to build a simulated multi-agent DevOps environment using nothing but prompt chaining, context injection, and side-by-side model testing with Crompt, an AI chat platform that gives you access to multiple top models like GPT-4o, Claude 3.5 Sonnet, Gemini 2.0 Flash, Mistral AI, and Grok 3 Mini—all from one screen.

Here’s how I built it.
And why it changed how I think about both AI and operations.


Why Simulate DevOps Teams With AI?

Most DevOps problems aren’t technical—they’re systemic:

  • Conflicting priorities (infra vs feature delivery)

  • Poor incident postmortems

  • “Works on my machine” syndrome

  • Tooling fragmentation across environments

I didn’t want another chatbot that told me how to use kubectl.
I wanted an AI system that could behave like an actual cross-functional team, making decisions, asking questions, and resolving friction.


Step 1: Define the Agents as Roles

Instead of one giant prompt, I split the system into four core DevOps agents:

  • Agent 1: SRE Bot

    • Monitors logs

    • Flags incidents

    • Proposes recovery paths

  • Agent 2: CI/CD Analyst

    • Watches pipeline logs

    • Suggests fixes for failed deploys

  • Agent 3: Infra Planner

    • Manages IaC (Terraform) recommendations

    • Suggests scaling strategies

  • Agent 4: Release Manager

    • Evaluates trade-offs

    • Chooses action paths from other agents

Each agent was run through a different model via Crompt:

  • GPT-4o for structured, systems thinking

  • Claude 3.5 Sonnet for rationale-based evaluation

  • Gemini 2.0 Flash for speed and command suggestions

  • Mistral AI for terse infra optimization

Using Crompt’s multi-model dashboard, I could simulate how each “team member” would respond under pressure—just like in a real standup.


Step 2: Inject Shared Context Dynamically

To make this work, I needed shared state.
A way for each agent to access logs, commits, and decisions from the others.

Crompt’s Document Summarizer came in clutch here.
I’d upload pipeline logs, deploy manifests, and recent incident reports as context files.

Then, each agent prompt would begin with:

txtCopyEditContext:
- Incident ID: XYZ-227
- Affected services: auth-api, user-gateway
- Last commit: hotfix rollback, SHA: a3f1c
- Summary of prior agent suggestions: [Crompt summary output]

Each model could now "see" the same situation, and act based on what others had already suggested.


Step 3: Chain Agent Outputs in Sequence

Rather than run them all at once, I simulated time steps.

Example workflow:

  1. SRE Bot detects high memory usage on a pod.

  2. CI/CD Analyst checks if the last pipeline touched that service.

  3. Infra Planner suggests vertical scaling vs node autoscaling.

  4. Release Manager weighs risk, proposes mitigation path.

Each step used Crompt’s memory-based chaining by summarizing the prior agent's response and embedding it into the next agent’s prompt.

You can do this via their CLI or directly in the platform.


Step 4: Evaluate Responses in Conflict

Sometimes agents disagreed. That was the goal.

Claude might say: “Add a resource limit to prevent overprovisioning.”
Mistral might reply: “Skip autoscaling; the spike is short-lived.”

Now I had the same friction we see in real teams—and a chance to practice decision-making under uncertainty.

In most cases, I’d feed both responses into a fifth "Evaluator Agent" (run on GPT-4o or Claude), asking:

txtCopyEditCompare the recommendations. Which is more viable given SLAs, current load, and prior incidents?

The results? Surprisingly insightful.
Like getting a second (or third) opinion from teammates with very different expertise.


Optional: Visualizing State With AI Tools

To debug complexity, I used Crompt’s Chart & Diagram Generator.
It visualized:

  • Dependency graphs

  • Pod-to-service maps

  • Timeline of the simulated incident

Now, even non-technical stakeholders could see what the agents were “thinking.”


What This Unlocked

Better root cause analysis
Simulating incident retros with Claude + GPT-4o uncovered issues I missed manually.

Faster recovery playbooks
Agent chains wrote clear recovery steps that doubled as documentation.

Pre-production stress testing
Simulated failure scenarios with agent responses let me prepare for edge cases proactively.


Final Take: This Isn’t “Just Prompting”—It’s System Design

Most devs treat AI as a fancier search engine.
But when you treat it like a multi-agent system, AI becomes a kind of cognitive infrastructure.

You’re not replacing your team.
You’re simulating the conversations you'd have—before things break.

And platforms like Crompt make that possible, because they give you access to:

  • All top AI models in one place

  • Model-to-model comparison

  • Tools for summarization, diagrams, memory, and chaining

So if you’re building complex systems—or just want to practice smarter decision-making in production-level workflows—don’t prompt harder.

Think like a system architect.
Simulate like a DevOps team.
And let AI do the coordination for you.

-Leena:)

0
Subscribe to my newsletter

Read articles from Leena Malhotra directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Leena Malhotra
Leena Malhotra