I poured some coffee, opened my Cursor editor, wrote a few prompt lines, and hit Enter. While I sipped and skimmed through articles, my AI agent breezed through the task and delivered flawless code before lunch…

And then I woke up.

In reality, I poured the same coffee, opened Cursor, and started a day full of prompt engineering, design decisions, code reviews, and tough trade-offs. Letting AI run loose on a vague task is a great way to watch it hurry forward — straight into the wrong direction. I’ve met too many people disappointed in the reality: output that’s unpredictable, verbose, overly complex, and sometimes just plain broken.

That said, I still use tools like Cursor daily — it’s an outstanding piece of software. But one of its core limitations is that it can’t read minds. LLMs are trained on unimaginable amounts of code and text, and they understand generic concepts well. What they lack is local knowledge — the context, conventions, and quirks of a specific project.

To provide the necessary context, we need to start with a solid plan. I usually kick off projects with two files (I got an idea from here) Both serve as “context memory” for the AI agents, providing persistent reference for the project’s goals:

PLANNING.md — defines the design, architecture, and approach.
TASK.md — breaks the issue into smaller, iterative steps to achieve the final goal.

Luckily, we don’t have to do this fully manually — we can ask our AI partner to help with this task as well.

Example

Let’s look at a real-world case: adding OIDC (OpenID Connect) support to Karapace, an open-source Schema Registry for Apache Kafka written in Python. The goal: enable Single Sign-On and simplify authentication, while keeping authorization in place to control who can read, write, or manage schemas.

(Disclaimer: This feature was recently implemented by another contributor, but it’s still a great opportunity to compare a professional solution with one generated by Agentic AI.)

Good news—you can follow this article without being an expert in the technologies involved. I chose this example because it’s a mid-size feature in an established project — exactly the kind of task where AI assistance can be tricky. AI agents thrive in greenfield projects, where boilerplate code is plentiful and design decisions are fresh. In mature codebases, you have to integrate into existing architectures, follow long-standing trade-offs, and respect historical design patterns.

While OIDC is a relatively generic problem that doesn’t require deep upfront design, a structured plan still helps ensure the AI produces code that fits seamlessly into the existing system.

Step 0: Problem Research

Before jumping into planning, I like to do quick research to better guide the AI. This can include:

Exploring the current codebase: “Explain to me how authentication and authorization work in this project.”
Requesting a simplified explanation:
“Describe in layman’s terms how OIDC works.”
Researching dependencies: “What existing solution could we use to test OIDC locally (without internet)? Suggest a few options and highlight the most practical one.”

In this case, the AI suggested:

“For local and integration tests, use the quay.io/keycloak/keycloak container.”

Doing this research first means our planning prompt will be richer, with practical details on how to verify the solution and how the end user will interact with it.

Step 1: Design-Only Mode: Keeping the Agent Out of Your Repo (For Now)

All those hours spent writing solid Jira tickets finally pay off: for planning, we need a clear problem description and explicit acceptance criteria.

Side note: If you already have a solid Jira (or similar) ticket, an MCP can fetch the issue for the agent or draft a suitable ticket description.

For the problem definition, I used a focused prompt like this:

“I need a design and plan for adding OIDC support to this project. It must fit the current authorization strategy and architecture. Reuse existing dependencies, code, and modules where possible. Operators deploying Karapace must be able to configure the required parameters in src/karapace/core/config.py to use OIDC (e.g., sasl_oauthbearer_jwks_endpoint_url, sasl_oauthbearer_expected_issuer). Include only the necessary new configuration keys and avoid duplicates. Support multiple OIDC providers (Google, Azure, GitHub); for local and integration tests, use quay.io/keycloak/keycloak. Describe how tokens from the OIDC provider should be translated into the correct access rights (e.g., schema:read, subject:write). Provide high-level planning only; include brief code examples only if needed. Start by creating PLANNING.md with the architecture, design, and problem description.”

Yes, it’s a bit verbose—but it captures what matters; missing details in the first pass are fine—iterate. Step 0 gave us specifics to verify the solution and how users will enable it (via config). Apply the “toddler rule”: focus on what to do; prefer “only high‑level planning” over “do not write code yet.”

Before running the prompt, add essential requirements and acceptance criteria—actionable, non‑conflicting checks that confirm the task is complete and correct:

Project fit: Reuses existing infrastructure/classes where appropriate.
Scoped impact: Lists files to modify or review.
Interface design: Clear, documented interfaces.
Algorithm/flow: Defined transition logic with a small example.
Validation rules: Comprehensive constraints and error handling.
Sequence diagram: Interactions between components are illustrated.

OIDC Integration Plan for Karapace — Outline

Problem Description — what we’re doing and why.
Architecture & Design
- 2.1 Configuration — how the customer configures the file.
- 2.2 OIDC Authorizer — required behavior and interfaces.
- 2.3 Role-to-Permission Mapping — how external roles map to existing ones.
- 2.4 Local & Integration Testing
Files to Modify — scope.
Validation Rules — user input checks.
Sequence Diagram — interactions to share with other developers.

With the plan in hand, gaps surface—planning is rarely perfect on the first pass, so iterate and harden the key areas.

Unhappy-path strategy & timeouts "Add a strategy for unhappy paths. For example, when request volume is high or the OIDC provider is unresponsive, apply a circuit breaker. Use a fail-fast approach for obviously invalid tokens."
Asynchronous provider calls & throughput "Make calls to the OIDC provider asynchronous and non‑blocking. The solution should handle ~1,000 requests per second concurrently."
Caching strategy "Introduce a TTL‑based caching strategy. Implement a single‑flight mechanism when the TTL expires to prevent cache stampede. Use a task‑based approach that shares future results to avoid locks."
Extend role storage with OIDC mappings
"Add a plan to extend the current role storage (file‑based permissions) with OIDC role mappings, rather than relying solely on configuration."

Even if these are only a few lines in the plan, they materially impact performance and resilience. Premature optimization causes problems; it’s good that the agent starts simple. We just tailor the plan to address likely bottlenecks. The result: a new section in PLANNING.md — Resilience and Error Handling.

Security needs equal attention. The plan lists sasl_oauthbearer_expected_audience as optional. While technically allowed in OIDC, that’s risky—audience validation helps prevent token replay.

Enforce audience validation
"sasl_oauthbearer_expected_audience is marked optional, but that can introduce security risks; make it mandatory."

Now that the planning is ready, we need to move forward and prepare a list of steps to implement it.

Step 2. Tasks

Now that we’re happy with the high-level plan and have defined the what and why, it’s time to define how the work should be done:

“Given the current plan, generate a file TASK.md with iterative steps to implement the feature described in PLANNING.md. Start by defining the public API and interfaces. Keep steps small and independently verifiable.”

In practice, if the planning file is clear and concise, Agentic AI will produce a solid task list. The piece I’m strict about is quality gates—small, objective checks that ensure each step doesn’t just “work,” but is good.

Baseline Quality Gates (apply to every step)

Mandatory unit tests for success and failure paths (logic only; skip test logging).
Passes mypy (type checks).
Passes ruff (linting).
Proper error handling (no swallowed exceptions; actionable messages).
Scope check: only files relevant to the current step are changed.
Logging policy: at least one error log per unhappy path; at most one log on success.

These gates let the agent self-correct while implementing, reducing manual review churn. Prevention is simple at this stage—keep TASK.md steps small, manageable and easily revertable. Here is first result:

OIDC Implementation Task List

Step 1: Configuration and Core Interfaces

Goal: Define all necessary configuration settings and the core OIDCAuthorizer interface without implementing business logic. This provides a stable foundation for the next steps.
Tasks: [4 sub-tasks]

Step 2: Auth File Format Extension and Sample Files

Goal: Create sample auth files with OIDC role mappings and verify the extended format works correctly.
Tasks: [3 sub-tasks]

Step 3: Asynchronous JWKS Fetching with Caching and Circuit Breaker

Goal: Implement the mechanism for securely and resiliently fetching the JSON Web Key Set from the OIDC provider.
Tasks: [3 sub-tasks]

Step 4: Full Token Validation

Goal: Implement the core security logic to validate an incoming OIDC token.
Tasks: [2 sub-tasks]

Step 5: Role Extraction and Authorization Logic

Goal: Translate the validated token into application-specific permissions using the auth file-based role mapping system.
Tasks: [3 sub-tasks]

Later, we can add more details. For example, additional steps to implement integration tests and OpenTelemetry. And if a feature is not too big, it makes sense to generate tasks and planning in one go, since they depend on each other. Final result available here.

Step 3. Implement, Pause, Iterate

Once the task list is ready, we can jump into implementation. There will almost certainly be edits and changes. That’s why it’s a good idea to explicitly mark where the agent should pause for verification within TASK.md.

Here’s the trade-off:

If you micromanage an agent and create lots of small tasks that need to be checked,, you burn time on hand-holding.
If the uninterrupted chunks of work are too big, generation slows down and cleanly reverting becomes painful.

A pragmatic middle path is to define short “verification waypoints” after critical steps. For example:

Stop after Step 1: open PR; run unit tests and static checks; request review.
Stop after Step 3: run integration smoke test; confirm JWKS caching and circuit breaker behavior with a forced failure scenario; only then proceed.

To reduce micromanagement time, lean on Evals, Guardrails, Rules, and Quality Gates—but that’s part of another story (Part 2).

Steering Agentic AI Towards High-Quality Code — Part 1: Planning