AI Coding Agents: Key Limitations Explored

If you're only interested in knowing the limitations of AI coding agents, feel free to skip ahead to Part 3.

Part 1: The Rise of AI IDEs & My First Experiences

It’s been about three years since the release of GPT-3.5. I wasn’t an early adopter — I first heard about it on social media but didn’t pay much attention. It wasn’t until 6–8 months after its release that I gave it a try. Back then, I had no idea how to use it effectively. I was new not only to the tool itself but also to the entire AI ecosystem surrounding it.

Things changed when a task came up during a sprint at work. The backend server was sending normalized data via a query, and on the frontend, I needed to display it in different categories. Although the data came from a table, my task was to present it in a categorized list format. The challenge was to organize the data into three groups: current month, past months, and future months. Within each group, entries had to be sorted by created_at in descending order. Additionally, when the user visited the page, the scrollbar needed to land directly on the current month’s data. Scrolling up would reveal past months, while scrolling down would show future months.

At first glance, the task might seem logically simple — maybe even the implementation. But at the time, I was a junior developer, fairly new to JavaScript and TypeScript. (Up until then, most of my experience had been with Java. I understand that logic is independent of the programming language.) So I decided to lean on ChatGPT. I described the problem in detail, and to my surprise, it gave me a solid explanation along with a clear code implementation strategy. After a bit of back-and-forth and a lengthy conversation, I was able to successfully complete the feature.

That was the moment it hit me. Before using ChatGPT, implementing something unfamiliar involved a ton of friction. You know the drill: 20+ browser tabs open 😉 — documentation, articles, Stack Overflow — trying out multiple approaches to get one thing done. But with GPT, the process felt almost frictionless. It could give me exactly what I needed.

From that day to now, I’ve become a heavy user of AI tools — for my day job, side projects, personal tasks, and just about anything else I’m working on. Over time, these tools have evolved rapidly. Today, AI-powered IDEs can build entire full-stack web applications from natural language prompts. Tools like bolt.new, Cursor, Replit, and GitHub Copilot have entered the scene. Among them, GitHub Copilot has been my go-to for the past 6–7 months. One major reason: it’s free (up to a token limit), which made it easy for me to try. After just a few days of regular use, I was amazed.

Copilot drastically speed up my development. It handled the boring, repetitive tasks I always dreaded — things like writing model files for database schemas in ORMs, setting up projects, implementing custom auth flows, and more. (We’ll come back to Copilot in a bit.)

Part 2: Where AI Shines: Copilot, bolt.new, and Developer Productivity

When AI Tools Surprised Me — Again

Somewhere along the way, I was introduced to bolt.new — a fully AI-driven IDE running on web containers, created by StackBlitz. Now, I’d been a long-time StackBlitz user even before the GPT-era began, mainly because I worked extensively with Angular. Back then, I often found valuable solutions and implementation ideas via StackBlitz examples.

But back to bolt.new — one day at my day job, I was reassigned to a different project with an urgent requirement. I remember it clearly: it was a Wednesday afternoon, and I was asked to build a front-end application (pure client-side) to simulate email processing done by our server. In short, we were developing an automation tool that processed incoming emails — it would read messages, extract intents and entities, create tickets for customer support, draft replies, and send responses back to customers.

To showcase this complex backend processing to potential investors, we needed a UI tool that visualized each step in real time.

I knew right away — this was nearly impossible to build within 2 days (they wanted it ready by Friday EOD). And although I considered myself a strong front-end developer, I also knew this wasn’t a quick job. So, I decided to take a chance — I paid $20 from my own pocket for a bolt.new subscription.

And guess what? I completed the entire app before the Friday deadline. That was the second major moment where an AI tool genuinely stunned me.

GitHub Copilot

Now, circling back to GitHub Copilot — over the past 3 months, I’ve been working on a side project called PromptOptimizer. In simple terms, this tool helps:

Developers craft more effective prompts
Business owners save costs on LLM services via token optimization

Instead of building the app inside an AI-driven IDE or Agents, I chose to build it from scratch, step-by-step and feature-by-feature. I wanted full control and a deep understanding of every part. However, I absolutely leveraged GitHub Copilot along the way.

At the time, Copilot offered two primary modes (there are three now, including the new Agent Mode):

Ask Mode – You explain your requirement, and it responds with an approach and detailed implementation steps.
Edit Mode – It directly edits your files or codebase based on your instructions.

No doubt, it was a huge productivity booster. I estimate it reduced my development time by 50–60%.

Want to implement authentication? Just ask.
Need state management set up? Ask again.
Within minutes, your scaffolding is ready, often with explanation and code together.

Part 3: Where AI Fails: Design Decisions and Real-World Shortcomings

The Bigger Question

After experiencing the true capabilities of these tools, I don’t find myself asking “Will AI take engineers’ jobs?” anymore. The real question is: When?

But that’s a topic for another day.

Despite the impressive capabilities and the convenience these tools provide, it's important to recognize they still have limitations, flaws, and inherent weaknesses.

Every tool’s creator markets it based on what it can do — its capabilities. At the same time, a segment of social media influencers and tech evangelists stay relentlessly optimistic, often pushing the idea that AI will completely transform development.

Beyond the Demos: Where AI Starts Falling Short

Many influencers on social media are already boldly claiming that AI is replacing human engineers — not just someday, but right now. Maybe they’re right in the long run, but I often notice something they all have in common: they’re usually just demonstrating simple CRUD applications. Nothing close to what we build in the complexity of real-world products.

For marketing purposes, companies also showcase their tools' best features in clean, polished demo videos. But rarely do I see anyone building a fully-fledged, complex application using these tools alone. (Then again, I could be wrong — maybe I just haven’t come across the right examples.)

Over the past few months, I’ve used GitHub Copilot extensively, and one clear limitation stands out: these AI tools struggle when it comes to design and architectural decision-making.

Sure, they can build beautiful landing pages, simple websites, CRUD-based full-stack apps, and even mobile apps. (In fact, I recently discovered they’re pretty good at building desktop apps and Chrome extensions, too.)

But Here's the Catch

When I say AI tools are poor at architectural or design-level decisions, I mean this: with the rise of agentic features in AI-driven IDEs, users expect to interact via natural language and get thoughtful, production-grade responses.

Let me give you a scenario.

Let’s say I ask an AI agent to implement a logger in my app. I don’t just want it to create a logger — I want it to evaluate the entire impact:

How will the logger affect API response times?
Is the logger scalable?
What’s the cost of the logging service?
If it stores logs in a database, will it account for read/write operations?
If it uses object storage (not DB), will it handle file update performance, pricing, and even Row Level Security (RLS)?

A junior developer might prompt something like:

“Implement custom logger functionality in this application and modify route.ts to log errors.”

And yes — AI will implement it fairly well. But a skilled engineer knows this isn’t enough for a production-grade application.

With more back-and-forth, AI could eventually address these concerns**. But then again — isn’t the whole point of AI that it should be smarter than us? Shouldn’t it anticipate these possibilities from the start?**

A Real Prompt I Tried

To put this to the test, I asked Copilot to implement a logger in my Next.js application. Here was my exact prompt:

// Implement a simple logger using Supabase Storage (not Supabase DB).
// - Create one log file per day in a bucket called 'app-logs' (e.g., logs/2025-05-01.json).
// - On each log call, append a new log entry (with timestamp and message) to that file.
// - If the file doesn't exist, create it.
// - Use `@supabase/supabase-js` client with environment variables SUPABASE_URL and SUPABASE_ANON_KEY.
// - Each log entry should include timestamp, level ('info' | 'error'), and message.
// - Export a function `logEvent(level: 'info' | 'error', message: string, metadata?: object)`
// - Keep it efficient for an MVP

Part 4: Why AI Isn't Ready to Replace Engineers — Yet

Why AI Isn't There Yet — Real Bottlenecks in Real Projects

To its credit, Copilot did generate a well-structured logEvent function and the related configuration using Supabase Storage exactly as I described.

But here's the issue: the logging mechanism involved awaiting every single log insertion inside route.ts — especially at all potential error points or within catch blocks. As developers, we know that a single API handler can have multiple failure points, and adding a logEvent call after each one is not optimal.

When I tested it, each individual log entry (written to Supabase Storage) took around 150–200ms. Multiple such operations per request meant my API response times were taking a noticeable hit.

Now imagine if I hadn’t inspected the implementation carefully — this inefficient design would have gone straight to production. That’s exactly where I expect AI to be better: not just in writing code, but in writing thoughtful, scalable, production-ready code.

Another Case: Handling Anonymous Users

In PromptOptimizer, I allow anonymous users to make up to 10 free LLM API calls. I had three user categories:

Anonymous Users – no sign-up, no authentication
Signed-in Users – signed up but haven't set up their own LLM provider/API keys
Fully Set Up Users – signed in and configured their own LLM API keys in settings

I use Supabase for authentication, which creates an auth session after successful email verification. But for anonymous users, I needed a custom solution — one that would let them use my own API keys for a limited number of requests (10 max).

So I prompted GPT and Copilot to implement this logic.

It actually understood the requirement well and built a clean implementation. But again — it missed critical real-world loopholes. The solution was built using cookies via NextRequest from Next.js to assign a pseudo userId and track request counts.

Client-Side

usePromptGenerationLimit.ts

// usePromptGenerationLimit.ts 
import { useState } from 'react';

const PROMPT_LIMIT = 10;

export function usePromptGenerationLimit() {
  const [promptsRemaining, setPromptsRemaining] = useState(PROMPT_LIMIT);
  const [showLimitModal, setShowLimitModal] = useState(false);

  const getUserIdFromCookies = () => {
    const match = document.cookie.match(/(^| )anonymous_user_id=([^;]+)/);
    if (match) return match[2];
    const newId = Math.random().toString(36).substring(2);
    document.cookie = `anonymous_user_id=${newId}; path=/; max-age=31536000`;
    return newId;
  };

  const checkAndIncrementPromptCount = async () => {
    const userId = getUserIdFromCookies();

    const response = await fetch(`/api/demoPromptCount?user=${userId}`);
    const { count } = await response.json();

    if (count >= PROMPT_LIMIT) {
      setShowLimitModal(true);
      return false;
    }

    await fetch(`/api/demoPromptCount`, {
      method: 'POST',
      body: JSON.stringify({ user: userId }),
    });

    setPromptsRemaining(prev => prev - 1);
    return true;
  };

  return {
    promptsRemaining,
    showLimitModal,
    checkAndIncrementPromptCount,
    setShowLimitModal,
  };
}

Server-Side

/api/demoPromptCount.ts

// /api/demoPromptCount.ts 
import { NextRequest, NextResponse } from 'next/server';

const db = new Map<string, number>(); // In-memory mock DB

export async function GET(req: NextRequest) {
  const userId = req.nextUrl.searchParams.get('user') || 'unknown';
  const count = db.get(userId) || 0;
  return NextResponse.json({ count });
}

export async function POST(req: NextRequest) {
  const body = await req.json();
  const userId = body.user || 'unknown';

  const currentCount = db.get(userId) || 0;
  db.set(userId, currentCount + 1);

  return NextResponse.json({ message: 'Count incremented' });
}

/api/signedupPromptCount

// /api/signedupPromptCount.ts (Demo Version)
import { NextRequest, NextResponse } from 'next/server';

const anonymousDB = new Map<string, number>();
const userDB = new Map<string, { promptCount: number; migrated: boolean }>();

export async function GET(req: NextRequest) {
  const userId = req.headers.get('x-user-id') || 'guest';
  const anonId = req.cookies.get('anonymous_user_id')?.value;

  // Migrate if needed
  if (anonId && !userDB.get(userId)?.migrated) {
    const anonCount = anonymousDB.get(anonId) || 0;
    const userRecord = userDB.get(userId) || { promptCount: 0, migrated: false };
    userDB.set(userId, {
      promptCount: userRecord.promptCount + anonCount,
      migrated: true,
    });
  }

  const count = userDB.get(userId)?.promptCount || 0;
  return NextResponse.json({ count });
}

export async function POST(req: NextRequest) {
  const userId = req.headers.get('x-user-id') || 'guest';
  const userRecord = userDB.get(userId) || { promptCount: 0, migrated: true };

  userDB.set(userId, {
    ...userRecord,
    promptCount: userRecord.promptCount + 1,
  });

  return NextResponse.json({ message: 'Signed-in count incremented' });
}

Note*: These demo files use mock databases and simple browser cookie logic to simulate the real behavior. In production, you would replace these with secure authentication (like Supabase), persistent databases, and a state manager like Redux or other. I couldn’t share actual code from my project* 😉

As you have referred above file. In a nutshell, the main cornerstone for anonymous users to check and store their counts is cookies only, right?

Where It Fails the Test of Production Readiness

But here’s the gap: cookies alone are fragile. Why?

What if the user opens a new incognito window? That’s a new session, new cookies — and now they get 10 more free calls.
What if the user simply clears cookies? Again, the limit resets.
Even though I had a rate limiter in place, it was tied to the user's IP address — which isn’t a reliable identifier in many environments. As a user might have VPN services, sometimes the Same IP is shared across multiple devices, blocking based on IPs only would block the legit users also.

These are the kind of loopholes that only senior engineers with real production experience are likely to think about. And that's the missing layer in AI-generated solutions.

The Verdict: Where AI Stands Today

This brings us to a conclusion that’s hard to ignore:

Either AI is not replacing human engineers yet — or those saying so are grossly overestimating its current capabilities.

AI is impressive. It’s a powerful assistant. But for now, it’s not a replacement for seasoned human judgment, system design, or experience in handling real-world edge cases. Not yet.

Limitations of AI Coding Agents.

Table of contents