The Silicon Valley AI Bubble: OpenAI’s "Thinking Model" Circus Act


Welcome to Silicon Valley's latest spectacle, where OpenAI's "thinking models" play the role of the bearded lady—a classic misdirection in the AI circus. Under the bright TED stage lights, OpenAI's star researcher Noam Brown sells an illusion: language models that seemingly pause to "think," complete with animated ellipses to suggest deep contemplation. But like any good circus act, what you see isn't what you get.
The Technical Sleight of Hand
Let's pull back the curtain on what "thinking models" really are: GPT-4o with a delay mechanism and some reinforcement learning patches. When you prompt these models, you're shown a thinking bubble animation—a theatrical touch designed to suggest profound contemplation. In reality, it's just extended token sampling with structured output formatting, dressed up as cognitive processing. This isn't innovation; it's stagecraft.
The most disturbing aspect? This elaborate show attempts to obscure a fundamental technical limitation: no amount of inference-time computation can expand a model's capabilities beyond its pretraining latent space. It's like trying to squeeze water from a stone—you can press harder and longer, but you can't extract what isn't there.
The Context: OpenAI's Scaling Crisis
As we approach the two-year anniversary of GPT-4's release, OpenAI faces an existential crisis. Their scaling strategy—the cornerstone of their technical narrative—has crumbled. While frontier labs have pivoted dramatically toward smaller, more efficient models of 2-8 billion parameters, OpenAI remains trapped by its own rhetoric, unable to admit that their 1,500 billion parameter scaling dreams have hit a wall.
Internal tests reportedly show GPT-5 failing to deliver meaningful improvements, creating mounting pressure as the industry's patience wears thin. This isn't just a temporary setback—it's a fundamental challenge to OpenAI's identity as the field's presumptive leader.
The "Thinking Model" Misdirection
Enter o1 and the "thinking model" narrative—a masterclass in corporate sleight of hand. Unable to extract more performance from pretraining, OpenAI has resorted to dressing up inference-time computation as revolutionary progress. It's like claiming a car goes faster because you've added a longer delay between pressing the gas pedal and the engine responding.
The Poker Tale: Marketing Over Mathematics
In this context, Brown's poker story takes on new significance. The 2015 competition—880,000 hands with $120,000 in prize money—and the 2017 rematch with $200,000 at stake aren't just historical anecdotes. They're carefully chosen narratives that reinforce the illusion that more computation time equals deeper understanding.
The bot's trillion-hand training across thousands of CPUs sounds impressive until you realize it's hitting the same fundamental limits as GPT-5—more computation yielding diminishing returns. The "System 2 Thinking" terminology borrows credibility from psychology while masking the reality: it's still just probability sampling, just slower.
O series: The Last Stand
OpenAI o series, o1 and o3, represents more than just another model release—it's OpenAI's attempt to maintain relevance in a field that's rapidly moving away from their scaling thesis. By repackaging GPT-4o with reinforcement learning tweaks and theatrical pauses, they're trying to transform their scaling failure into a virtue. But you can't escape the limitations of your pretraining by simply spending more time thinking about them.
The Language of Desperation
The shift in OpenAI's rhetoric is telling. Gone are the confident predictions about scaling to trillion-parameter models. Instead, we get careful constructions about "thinking time" and "system 2 processing"—terms designed to mask technical limitations behind psychological metaphors. When Brown claims o1 "benefits by being able to think for longer," he's not describing a breakthrough; he's performing damage control.
The Technical Debt Mounts
The real danger lies in the precedent this sets. By prioritizing the appearance of progress over technical soundness, OpenAI risks pushing the field toward flashy demos rather than fundamental advances. The reinforcement learning modifications and artificial delays aren't just cosmetic changes—they're technical debt that will compound over time, potentially destabilizing the very foundations they're built upon.
A Path Forward: Beyond the Illusion
The industry stands at a crossroads. While OpenAI performs its "thinking model" show, real progress is happening in labs focused on model efficiency and architectural innovation. The future lies not in theatrical pauses or psychological metaphors, but in honest engagement with technical limitations and genuine innovation in model architecture.
As the audience files out of Brown's TED talk, dazzled by visions of AI omniscience, we must remember: behind every successful circus act is a careful misdirection. The question isn't whether AI will transform society—it's whether we'll let marketing-driven theatrics derail genuine technical progress. The emperors of Silicon Valley may have new clothes, but their "thinking models" are still just GPT-4 in a carnival costume.
Subscribe to my newsletter
Read articles from Gerard Sans directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Gerard Sans
Gerard Sans
I help developers succeed in Artificial Intelligence and Web3; Former AWS Amplify Developer Advocate. I am very excited about the future of the Web and JavaScript. Always happy Computer Science Engineer and humble Google Developer Expert. I love sharing my knowledge by speaking, training and writing about cool technologies. I love running communities and meetups such as Web3 London, GraphQL London, GraphQL San Francisco, mentoring students and giving back to the community.