The Limits of AI in 2025

In the rapidly evolving landscape of artificial intelligence, we're witnessing a curious phenomenon: a single technology being marketed as the solution to virtually every problem imaginable. OpenAI, once focused on transparent research, now seems to live in its own reality where their transformer models represent nascent intelligence that can tackle anything their marketing department dreams up.

Modest Origins: Just Text Completion

Let's remember where this all began. The transformer architecture was initially designed for a specific purpose: text completion and paraphrasing. It was good at continuing text in a coherent manner based on patterns it had seen in its training data. During this humble beginning, the technology had clear boundaries and limitations.

Then came the so-called "emergent behaviors" - capabilities like coding, fact retrieval, and basic reasoning that weren't explicitly programmed but appeared as the models scaled. These were interesting developments, but hardly the dawn of general intelligence that some would have us believe.

The First Stretch: Conversation

The pivot to conversation was the first major instance of overreaching. By wrapping a text completion engine in a chat interface and adding some clever prompting, these models suddenly became "assistants" rather than what they truly are: sophisticated pattern matchers.

This reframing was brilliant marketing but created a fundamental misunderstanding about what these systems actually do. They don't think, reason, or understand - they predict what text should come next based on statistical patterns in their training data.

The Solution for Everything You Can Imagine

If you lack technical background in AI, it's difficult not to be impressed by these systems. They appear knowledgeable across countless domains. This creates what I call the "expertise trap" in evaluating LLMs: if you're an expert in a subject, you quickly spot the model's errors in your field. But most people aren't experts in most fields, making it nearly impossible for the general public to notice the cracks.

The appearance of expertise, combined with the lack of tools to verify outputs and the inherent uncertainty in model predictions, turns both speculation and debunking into matters of opinion rather than fact. When even researchers can't agree on the capabilities and limitations of these systems, charlatans have plenty of room to make extravagant claims.

The Intelligence Narrative Takes Off

This environment created the perfect conditions for the "intelligence narrative" to flourish. By anthropomorphizing these systems and using terms like "thinking," "understanding," and "learning," companies push the idea that we're dealing with something approaching human cognition rather than sophisticated statistical models.

The Fact Retrieval Narrative

One capability often highlighted is the model's ability to answer factual questions. Since Wikipedia and other reference materials were part of the training data, these models can often provide accurate-sounding answers to general questions. It's not always perfect, but it's good enough to fool most people most of the time.

However, using LLMs as knowledge bases or for data retrieval feels fundamentally forced. The design is clearly optimized for pattern matching, not information storage or retrieval. There's no way to control what information the model retains, frequency biases create unstable conditions, and context dependency means the right answer may not always make it to the output.

Confusing Data Retrieval with Cognitive Function

Perhaps the most troubling aspect is the conflation of data retrieval with actual cognitive functions. When a model produces factual information, it's not "remembering" or "knowing" in any human sense - it's regenerating patterns from its training data through a complex statistical process.

Yet this capability is presented as evidence of intelligence rather than what it really is: a sophisticated form of pattern recognition that sometimes produces accurate information and sometimes doesn't.

The Remote Worker Fallacy

The ultimate stretching of the technology is the portrayal of these systems as something akin to remote workers or persons. This framing confuses pattern matching with social identity and agency. These systems don't have intentions, goals, or understanding - they produce outputs based on inputs according to their training.

The Failure of Stretching: When Design Limitations Become Painfully Obvious

The practice of applying the exact same transformer architecture to every expanding requirement without fundamental redesign is now revealing critical failures across multiple domains. These aren't mere bugs but symptoms of architectural mismatch between the technology and its intended use cases.

The Context Trap: Stochastic Funneling in Action

Have you ever had a lengthy conversation with an AI assistant only to find it increasingly difficult to shift topics? This "context inertia" becomes painfully apparent in any interaction that extends beyond a few turns.

Example Failure: A user collaborating with an AI on a marketing strategy suddenly needs information about legal compliance. Despite clear requests, the AI continues steering responses back to marketing concepts, requiring the user to start a new conversation entirely to break free from the marketing context.

The transformer mechanisms create what I call "context drag" - the longer a conversation goes, the harder it becomes to shift direction. This isn't a minor inconvenience; it's a fundamental limitation of the architecture being used far beyond its intended design parameters.

AI Agents: Dragging a Heavy Context Chain

The problem becomes even more severe with AI agents, which require enormous amounts of context that only grows over time.

Example Failure: An AI agent tasked with research accumulates so much context after a few hours that it begins contradicting itself, forgetting earlier findings, and becoming increasingly unable to incorporate new information without distortion.

Navigating a lengthy interaction with these systems becomes like trying to maneuver a container ship through a narrow canal. You simply can't move freely, and eventually, you're forced into a "hard reset" - starting from scratch.

When Attention Mechanisms Backfire

Why does this happen? The transformer's attention mechanism and autoregressive behavior - strengths in short-form text completion - become liabilities in extended interactions.

Example Failure: A coding assistant that brilliantly generates small functions begins to lose coherence when maintaining complex class relationships across hundreds of lines, repeatedly forgetting its own variable naming conventions and architectural decisions made earlier in the conversation.

The transformer was never designed for long-term interaction but for language completion. Adding more functionality without proper re-engineering is like overloading a bicycle with five passengers. The original design wasn't built for this weight, and it shows.

The Stateless Problem: Taking a Road Car Off-Road

At its core, the transformer is stateless, with no true memory beyond what's in its immediate context window. The forceful imposition on LLMs to operate in territories they weren't engineered for is akin to taking a luxury sedan off-roading.

Example Failure: An AI assistant used for ongoing project management loses track of key deliverables over time, requiring constant reminders and recaps that wouldn't be necessary with a system actually designed for maintaining state across sessions.

Soon enough, just like a road car in the wilderness would lose its wheels and break its transmission, these models break down in ways that reveal their fundamental unsuitability for these tasks. No amount of garage work afterward can fix a design that wasn't meant for the terrain.

Market-Driven Development Without Technical Foundation

This pattern of failures points to an excessive market-driven approach not backed by technical prowess. OpenAI appears more focused on announcing new capabilities than addressing fundamental limitations in their architecture.

The reality is that many of these failures aren't surprising to AI researchers who understand the transformer architecture's constraints. What's surprising is the willingness to market these systems as appropriate solutions for use cases they were never designed to handle.

This approach prioritizes market buzz over engineering soundness - a strategy that may win headlines in the short term but ultimately undermines confidence in the technology as users encounter its limitations firsthand. True advancement in AI requires acknowledging architectural limitations and developing new approaches, not merely stretching existing ones beyond their breaking point.

Conclusion

As we navigate the hype cycle of AI, it's crucial to maintain critical thinking. The same technology that's great at text completion is being stretched beyond recognition to serve as conversationalist, knowledge base, reasoning engine, and even quasi-person. While these models are impressive feats of engineering, presenting one technology as the solution to everything has historically been the domain of snake oil salesmen.

The next time you hear about an AI system that seems to do everything, remember: if it sounds too good to be true, it probably is. True progress in AI will likely come not from stretching a single architecture to its breaking point, but from developing specialized systems for different tasks, with clear understandings of their capabilities and limitations.

The Snake Oil of AI in 2025: When One Technology Is Suspiciously the Solution to Everything

Table of contents