From Plans to Production: Using LLMs to Generate Strategy


Large Language Models (LLMs) are increasingly used to suggest not just answers — but actions. From business automation rules to trading strategies and decisions, LLMs are capable of producing structured guidance that could be executed by machines or people. But that raises a critical question:
What happens when the LLM gets it wrong — and the plan is immediately executed?
In high-stakes environments like finance, security, or logistics, the cost of a bad decision isn’t just a typo — it can mean real-world losses.
That’s why one of the most promising patterns in responsible AI development is LLM-based guidance generation without execution — where the model creates a "recipe" or plan of actions that gets evaluated before anything goes live.
The Safer Pattern: Generate, Evaluate, Promote
Instead of having an LLM directly control live systems, we separate the process into three phases:
Generate – The LLM proposes a structured set of actions or rules.
Evaluate – Those actions are reviewed and tested in a dev or sandbox environment.
Promote – If the outputs meet predefined success and safety criteria, they’re moved into production.
This approach maintains flexibility and creativity while preserving human and system oversight.
Example: An Algo Trading Strategy
Let’s say you want an LLM to help create a new algorithmic trading rule.
The LLM receives market data and account parameters.
It generates a strategy: for example, "Buy ETH when it drops more than 3% in 4 hours and sell if it gains 5% after purchase."
This strategy is not yet live.
It’s first tested in a sandbox wallet — with fake money and simulated markets.
After a period of observation, if the rule performs well and remains within acceptable risk thresholds, it is promoted to the production wallet.
✅ The LLM writes the playbook.
❌ It doesn't run the playbook until it's approved.
Example: Planning a Route With Constraints
Imagine using an LLM to help a drone or autonomous agent move from Point A to Point B.
The LLM generates multiple potential paths (with estimated time, energy use, risk level, etc.).
Each path is evaluated in simulation.
Only paths under a specific risk score are passed to the real agent.
For instance:
{
"paths": [
{"route": "A > C > D > B", "duration_min": 7, "risk_score": 0.3},
{"route": "A > X > Y > B", "duration_min": 6, "risk_score": 0.7},
{"route": "A > B", "duration_min": 5, "risk_score": 0.9}
],
"threshold": 0.5
}
Only the path with a risk score below 0.5 is sent to the real-world executor.
This ensures that the LLM is used for creative exploration and intelligent suggestion, while final decisions are made based on rules, risk thresholds, or human review.
Benefits of This Architecture
Safety: Prevents unvalidated LLM outputs from triggering costly mistakes
Auditability: Keeps a clear log of what was generated, why it was selected, and how it was promoted
Adaptability: Allows LLMs to suggest edge-case solutions that humans may not consider
Continuous improvement: Feedback from tests can be used to retrain or reinforce better strategies over time
Final Thought
LLMs are fantastic strategy machines — but dangerous operators. By keeping them in the planner seat rather than the driver's seat, we get the best of both worlds: fast ideation, safe execution.
In a world where AI is increasingly asked not just to think but to act, this middle-ground — generate, evaluate, promote — may be the architecture that keeps AI useful, practical, and safe.
Subscribe to my newsletter
Read articles from Zohar Franco directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
