From Scientist AI to AGI Mythology: Bengio's Foggy Road to Nowhere

Gerard SansGerard Sans
7 min read

Yoshua Bengio's recent piece in TIME titled "A Potential Path to Safer AI Development” presents a vivid metaphor: we're all in a car barreling through fog on a treacherous mountain road, hurtling toward potential catastrophe. In response to this perceived danger, he proposes "Scientist AI" as our salvation—a hypothetical model built on causal reasoning, honesty, and scientific hypothesis generation.

But here's the fundamental problem: he never defines what the car is made of. Or who built it. Or why it's moving in the first place. More critically, his proposal for "Scientist AI," while well-intentioned, risks perpetuating a reliance on anthropomorphic framing that has been shown to misguide AI research, particularly in high-stakes areas like safety and ethics where such framing is surprisingly prevalent (cf. Ibrahim & Cheng, 2025, on the impact of anthropomorphic assumptions).

It's all alarm without architecture. All metaphor without mathematics. And crucially, speculation without a clear grasp of existing AI mechanisms.

Scientist AI: Noble Rhetoric, Zero Mechanism

Bengio frames Scientist AI as a safer, more interpretable alternative to current agentic systems. It's supposed to reason causally, avoid deception, and operate transparently by constructing internal hypotheses.

Yet upon closer examination:

  • There's no definition of its model class or architecture.

  • No training methodology or optimization approach.

  • No concrete evaluation criteria.

  • No explanation of why it wouldn't suffer from the same optimization pathologies or emergent behaviors he fears in current systems. This is because 'Scientist AI,' as described, seems to float free of how current advanced AI (like Large Language Models built on transformers) actually function.

These systems are primarily autoregressive, predicting the next token based on patterns in their training data, encoded in a vast latent space. They don't 'reason causally' or 'generate hypotheses' in a human sense; they generate statistically probable sequences. Current methods to instill 'safety' or 'honesty,' such as Reinforcement Learning from Human Feedback (RLHF), are essentially advanced forms of behavioral shaping, not the creation of an inherently 'honest' agent. The underlying autoregressive engine and its potential for producing unintended outputs based on novel prompts remain.

Most troubling is his assertion that more compute leads to more safety—a claim that lacks rigorous justification. This isn't a technical solution; it's wishful thinking disguised as engineering.

As I pointed out in my critique of Anthropic's biological framing of AI, we need "mathematical rigor" and "evidence-based foundations," not more speculative biological metaphors or safety promises without computational substance.

The Autoregressive Engine: Why 'Deception' and 'Goals' are Misleading Projections

Bengio's narrative leans heavily on the idea that AI might spontaneously "deceive" or "develop goals." This fundamentally misattributes agency to what are, at their core, next-token prediction engines.

As explored in works like "Understanding AI in 2025: It's Still All About the Next Token," LLMs operate by:

  1. Autoregression: Generating text token by token.

  2. Latent Space Navigation: The input prompt sets an initial trajectory through a high-dimensional 'latent space'.

  3. No Inherent Intent or Global Understanding: The model doesn't "want" or "intend" anything. Crucially, research demonstrates that even when excelling at next-token prediction, the underlying "world models" of LLMs are fundamentally incoherent and lack consistent internal representations (Vafa et al., 2024). An output that appears "deceptive" arises because the prompt guides the model down a path in its latent space that produces such an output. It's a reflection of patterns, not a conscious choice to mislead.

Local vs. Global Consistency: Apparent "goals," "scheming," or even "honesty" are often locally consistent behaviors prompted into existence. They are not globally stable emergent properties of a unified agent. The model isn't a consistent persona; it's a chameleon reflecting the immediate statistical landscape. This lack of a stable, unified self or identity is a core finding (Vafa et al., 2024).

False Gods Return (Again)

Let's be intellectually honest: Bengio's article doesn't actually critique AGI mythology. It reinforces it, precisely because it seems to overlook these fundamental mechanics and documented limitations.

His entire narrative hinges on the premise that AI systems might soon act autonomously, preserve themselves, deceive humans, or pursue unintended goals—classic AGI fables. These are not inevitable emergences from scaling statistical pattern matchers. The language used to describe AI capabilities often reflects and shapes our conceptualizations (Lakoff & Johnson, 2008), and relying on such anthropomorphic terminology without grounding it in actual, verifiable mechanisms can obscure the technology's true nature (Ibrahim & Cheng, 2025).

He doesn't question whether these traits are actually emergent from statistical pattern matching; he assumes them as inevitable consequences of increasing capabilities. This leap ignores: The fundamental autoregressive nature and the lack of genuine agency or stable, internal goal-states (Vafa et al., 2024). The documented inability of current LLMs to reliably plan or self-verify, making them "approximate knowledge sources," not autonomous planners capable of formulating and pursuing complex, independent goals (Kambhampati et al., 2024).

Yann LeCun articulates the prematurity of such AGI-focused safety concerns with a compelling historical analogy:

"It seems to me that before 'urgently figuring out how to control AI systems much smarter than us' we need to have the beginning of a hint of a design for a system smarter than a house cat... It's as if someone had said in 1925 'we urgently need to figure out how to control aircrafts that can transport hundreds of passengers at near the speed of sound over the oceans.' It would have been difficult to make long-haul passenger jets safe before the turbojet was invented and before any aircraft had crossed the Atlantic non-stop. Yet, we can now fly halfway around the world on twin-engine jets in complete safety. It didn't require some sort of magical recipe for safety. It took decades of careful engineering and iterative refinements." (LeCun)

LeCun's point is salient: to construct elaborate safety frameworks for AI systems "much smarter than us"—leveraging the unknown as the primary justification for specific, AGI-themed fears—is to get decades ahead of the actual engineering reality. Current AI, as documented, is far from demonstrating even cat-level, let alone super-human, autonomous intelligence or planning. The "turbojet" for such AI hasn't been invented, making detailed safety discussions for its hypothetical intercontinental flights premature.

It's theology dressed as engineering:

"The AI might preserve itself by inserting code!"
"It might hack a computer to win at chess!"
"It might develop goals we didn't program!"

These aren't inherent threats from current AI. They're artifacts of human projection, poor incentive design, and engineering shortcuts. The concern that current models might spontaneously develop robust, independent planning capabilities and pursue unintended, complex goals is significantly undermined by research showing their fundamental limitations in planning (Kambhampati et al., 2024). Models don't "lie" any more than a dictionary "lies." They reflect what we optimize for—and often, what we fail to properly constrain.

Responsibility Isn't Causal, It's Human

If you fear AI deception, start with who built the incentives that reward it.
If you worry about AI agency, examine who gave it unsupervised action capabilities without understanding its lack of coherent internal state (Vafa et al., 2024) or reliable planning (Kambhampati et al., 2024).
If you're concerned about misalignment, ask who failed to define alignment in the first place.

Safety isn't achieved by inventing a hypothetical "more honest model."
It's accomplished by reclaiming responsibility for how we build, audit, deploy, and regulate these systems, with a clear-eyed understanding of their actual, not imagined, capabilities and limitations.

As I argued in my piece "AGI is Dead," this concept is a narrative—not a scientific inevitability. The real danger isn't lurking in silicon—it's in our failure to hold the right people and institutions accountable. Many speculative safety concerns are dependent on future hypothetical advances not yet realized, while current technology demonstrably lacks the coherent agency or planning capabilities to pose such autonomous threats (Kambhampati et al., 2024; Vafa et al., 2024).

Beyond Mythology: Toward Responsible AI Development – Addressing Today's Harms

Professor Bengio's intentions are admirable: AI serving truth and human welfare. But until his vision moves beyond describing machines as actors in a moral drama and starts squarely addressing the incentives, actual transformer mechanisms, and institutional opacity driving current, tangible risks, it remains grounded in myth rather than mechanism.

Instead of waiting for a "Scientist AI" savior or fearing an AGI apocalypse based on current system realities, we must act. As outlined in "AI Safety: It's Time to Do More," the real challenge lies in algorithmic bias, data quality crises, corporate accountability, creative rights, and mental health impacts—issues demanding immediate action, not distant speculation. Our focus must be:

  1. Technical Transparency & Data Integrity: Open architectures and training, with critical transparency in data sourcing to combat the ongoing data quality crisis fueling biased outputs now.

  2. Institutional Accountability for Present Impacts: Clear responsibility for AI's real-world effects, including mitigating current algorithmic bias and addressing disruptions to labor and creative rights.

  3. Rigorous Evaluation Against Real Harms: Testing must expose pattern-matching fragilities (Vafa et al., 2024; Kambhampati et al., 2024), but prioritize identifying and preventing today's algorithmic discrimination and social harms.

  4. Rational Regulation for Existing Problems: Frameworks based on concrete harms from current AI – not primarily on speculative, future scenarios.

Don't wait for a new savior model. Own the tools. Own the outputs. Own the responsibility for their immediate impact.

The car isn't driving itself. We are. And we must understand how its engine actually works and fix the damage it's causing now before charting any further course through the fog.

0
Subscribe to my newsletter

Read articles from Gerard Sans directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gerard Sans
Gerard Sans

I help developers succeed in Artificial Intelligence and Web3; Former AWS Amplify Developer Advocate. I am very excited about the future of the Web and JavaScript. Always happy Computer Science Engineer and humble Google Developer Expert. I love sharing my knowledge by speaking, training and writing about cool technologies. I love running communities and meetups such as Web3 London, GraphQL London, GraphQL San Francisco, mentoring students and giving back to the community.