Anthropic's Detachment from Science Fuels Welfare Fantasies and Biological Fictions

Gerard SansGerard Sans
10 min read

A deeply concerning pattern is emerging from major AI labs like Anthropic: a startling detachment from scientific reality, where performative schpiel and speculative fiction are prioritized over technical grounding. This is not a minor misstep; it's a series of major failures. Nowhere is this more evident than in the charade surrounding so-called "reasoning models." These models utilize the exact same transformer architecture and backpropagation mechanisms as their predecessors, yet are heralded as possessing emergent reasoning—a claim that then serves as a ludicrous trampoline for far-fetched speculations about consciousness and AI welfare, becoming so removed from reality they are frankly embarrassing. This foundational misrepresentation of current capabilities fuels further outlandish ventures, such as their embrace of "AI welfare" research and the misleading use of biological metaphors in publications like "On the Biology of a Large Language Model" (Lindsey, Gurnee, et al., March 27, 2025). These developments paint a grim picture of an organization—and perhaps an industry segment—dangerously adrift from technical sobriety.

The "Reasoning Model" Charade: A Case Study in Scientific Detachment and Technical Misdirection

The most glaring evidence of Anthropic's detachment lies in their promotion and technical understanding—or lack thereof—of their own "reasoning models." As my recent analyses (e.g., "Unmasking the Reality of 'Reasoning' Models Like Claude 3.7 Sonnet," March 3, 2025, and "The Rise and Fall of Reinforcement Learning for LLM 'Reasoning'," May 11, 2025) have exposed, what Anthropic and others tout as breakthroughs in AI "reasoning" are often illusions built on shaky foundations and a misunderstanding (or misrepresentation) of underlying mechanisms.

Techniques like Reinforcement Learning via Chain of Thought (RL via CoT) or Test-Time Scaling, which underpin many of these "reasoning" claims, do not introduce new cognitive machinery. Instead, RL via CoT relies on sophisticated prompt engineering and output shaping to drive generation. This can be described as a form of "stochastic funneling," where a "token cushion" (often at significant computational and monetary cost) helps bring certain pre-existing activations in the latent space into focus. Crucially, these techniques are still entirely reliant on the latent space established during the initial pretraining phase and the availability of relevant information within the training corpus for that specific region of the latent space. The underlying mechanism remains the unchanged transformer architecture.

Recent research starkly confirms these limitations:

  • Performance Bounded by Pretraining, Not True Enhancement: The paper "Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?" (Yue et al., arXiv:2504.13837) compellingly argues that what we're witnessing isn't the emergence of new reasoning capabilities but rather the optimization of pre-existing patterns. Their findings reveal that while RLfR-trained models might achieve impressive pass@1 rates (getting the answer right on the first try), the underlying base models often already contained those correct answers, findable with enough attempts. This strongly suggests that performance is bounded by pretraining, not fundamentally extended by RL. Pretraining sets the search space, and RL merely optimizes navigation within that pre-defined space.

  • Pattern Matching Over Genuine Reasoning: The question posed by (How) Do reasoning models reason?(arXiv:2504.09762v1) suggests that improvements are more attributable to better pattern matching and plausible output generation for specific, learned tasks, rather than a fundamental enhancement of underlying reasoning. These models still struggle with generalization and can generate confident but false justifications, especially when faced with variations or unsolvable problems.

  • Methodological Flaws and Output Space Distortion: "A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility" (arXiv:2504.07086) highlights that evaluation methods often lack rigor. Progress in "reasoning" often "outpaces methodological rigor," with evaluations lacking transparency, robustness, or statistical grounding. Reported improvements often reflect a tiny fraction of the total output space, and standardized benchmarks can miss massive amounts of potential output space distortion which go unreported. Performance gains can hinge on subtle, unreported implementation choices, not genuine breakthroughs.

  • Latent Space Degradation and Prompting Equivalency: If fine-tuning has already been applied before RL, the latent space may already be distorted or even collapsed. This removes the combinatorial expanse that gives language models their power, meaning performance can, in fact, only be reduced from the original pretraining potential. Furthermore, the search space that RL painstakingly navigates is often equally exploitable by well-crafted regular prompting techniques, without the overhead and complexity of RL.

In essence, these "reasoning models" are not thinking differently; they are, at best, being guided more effectively through their existing statistical landscapes. To claim that these minor, optimization-driven nudges within the exact same architectural paradigm represent genuine "reasoning"—and then to use this as a launchpad for discussions of AI consciousness and welfare—is not just a scientific misstep, it is an embarrassing leap into fiction. It showcases a profound misunderstanding, or a deliberate obfuscation, of how these systems actually work.

The AI Welfare Speculation: Fiction Piled on Fiction

With the ground already softened by the "reasoning model" charade—a charade built on the same transformer and backpropagation as any other LLM—Anthropic's venture into "AI welfare" appears even more detached and preposterous. In November 2024, the "Taking AI Welfare Seriously" paper argued for preparing for AI "moral patienthood" by 2035. Anthropic's subsequent funding, hiring of Kyle Fish (September 2024), and launch of a "Model Welfare" program (April 2025) signal their endorsement.

If their understanding of current model "reasoning"—which is, again, based on unchanged core technology—is so demonstrably flawed and exaggerated, how can their projections about future sentience arising from these same systems be taken seriously? This is speculative philosophy racing dangerously ahead of, and in direct contradiction to, technical reality. It's a concerning shift in priorities based not on evidence, but on the fictions spun around their existing, misunderstood systems.

The Biological Metaphor Distraction: Compounding the Detachment

As I highlighted in my open letter (March 28, 2025), Anthropic's paper "On the Biology of a Large Language Model" further cements this detachment. Investigating Claude 3.5 Haiku through cellular structures, "features," and "circuits" to find "multi-step reasoning" and "planning in poems" is not just a stylistic choice—it's a fundamental misrepresentation and an unwarranted anthropomorphization of statistical prediction engines.

This biological framing is a dangerous distraction, lending an air of organic complexity to systems that are, at their core, staggeringly complex statistical engines, not nascent biology. This narrative conveniently sidesteps the harsh technical realities of their actual computational properties.

The Unbridged Chasm – A Gulf of Misunderstanding

All these initiatives—the "reasoning model" charade, AI welfare, and biological metaphors—suffer from the same fatal flaw: an unbridged chasm between what AI systems are (statistical pattern matchers built on transformers and backpropagation) and what Anthropic speculates or claims they might become.

Current LLMs, including Claude, are statistical predictors with no established pathway to consciousness, sentience, biological organization, or the kind of robust, generalizable reasoning they are marketed to possess. As I've repeatedly stated, "Transformers like Claude aren't 'biological'... they don't think; they use their attention mechanism to calculate the most probable sequence continuation." The "circuits" are computational artifacts, not nascent biology or genuine thought processes.

Research Amnesia & The "Reasoning" Charade: Willful Ignorance

Anthropic exhibits what I call "research amnesia," ignoring or downplaying a vast body of research (including the papers cited above and my own pointed analyses) that documents the fundamental limitations and mischaracterizations of these systems:

  • Fundamental Reasoning Failures: Arkoudas's "GPT-4 Can't Reason" and subsequent research, including the critical analyses of RL via CoT, show these models are "utterly incapable of reasoning" in a human-like sense.

  • Inability to Plan or Self-Verify: Kambhampati et al. show LLMs can't plan reliably, directly refuting interpretations of "planning in poems" or robust reasoning capabilities, especially when these are benchmark-specific and token-bloated.

  • Incoherent Implicit World Models: Vafa et al. formalized why LLMs fail at robust reasoning due to incoherent "world models," explaining the fragility and performance collapses I've documented.

  • Fragile Pattern-Matching, Not Understanding: Mirzadeh et al. (and the insights from the RLfR critiques) show performance degradation with minor changes, indicating probabilistic pattern-matching, not the genuine understanding or reasoning Anthropic often implies.

By pushing narratives of advanced "reasoning," "AI welfare," or "biological" properties while ignoring these established limitations stemming from the unchanged core technology, Anthropic engages in a dangerous mischaracterization, bordering on scientific malpractice.

Where's the Technical Bridge? The Void Beneath the Hype

These initiatives rely on speculative frameworks while utterly failing to address concrete technical hurdles:

  • Embeddings as proto-anything: They are mathematical representations, not proto-experience, biology, or genuine understanding.

  • Pattern recognition (even "enhanced" by RL or token bloat) as proto-cognition/reasoning: There is no technical explanation for how statistical pattern matching, running on the same transformers and backpropagation, transforms into genuine intentionality or robust, generalizable reasoning. The "reasoning model" claims are built on this fallacy.

  • Scaling/Token Bloat as a path to sentience/true reasoning: This reflects a fundamental misunderstanding. Scaling improves pattern recognition efficiency for specific tasks but does not categorically transform the system's nature into a thinking or sentient entity. The "extended thinking" modes are prime examples of this failed logic.

The "reasoning" is often just "token bombs" and "contextual hijacking." This isn't progress; it's an illusion maintained by ignoring the technical void.

The Resource Allocation Dilemma & The Cost of Performative Schpiel

This detachment has severe real-world consequences. Resources are diverted from pressing, demonstrable issues:

  • Algorithmic bias, data privacy, misinformation.

  • Actual, demonstrable alignment challenges and the technical debt from overhyped "reasoning" capabilities.

  • Environmental impacts, exacerbated by inefficient, token-guzzling "reasoning" modes.

Prioritizing these fictions and performative schpiels represents a gross misallocation of resources and a betrayal of scientific responsibility.

The Missing Disclaimer & The Deceptive Narrative

Most damning is the framing of these speculative or misrepresented concepts as credible without adequate, honest disclaimers. This is a deceptive narrative that misleads the public, policymakers, and even investors.

A responsible approach would explicitly acknowledge:

  • The absence of established mechanisms for these transformations from the existing transformer/backpropagation architecture.

  • The qualitative, not just quantitative, gap.

  • The substantial opportunity costs and the misleading nature of current "reasoning" claims.

Instead, the rhetoric lends unwarranted credibility to fiction.

The Deployment Contradiction – Hypocrisy or Incompetence?

Anthropic's simultaneous deployment of Claude while promoting these detached views reveals a staggering contradiction, bordering on hypocrisy or demonstrating profound technical incompetence. If they genuinely believe their "reasoning models" (built on standard tech) are truly reasoning, or that sentience is near, their current development and deployment practices are ethically indefensible.

The continued standard practices suggest either:

  1. They don't actually believe their own hype about "reasoning," sentience, or biological parallels, making their pronouncements a cynical marketing ploy.

  2. They are so detached from the technical reality of their own systems that they fail to see the glaring contradictions, which is arguably more concerning for a leading AI lab.

Either interpretation shatters their credibility.

The Urgent Need for Technical Sobriety: Ending the Fiction

Presenting speculative philosophy or grossly inflated "reasoning" capabilities (from unchanged core technology) as imminent or actual, without technical grounding, is irresponsible and damaging.

A technically sober and honest approach would:

  • Clearly distinguish between statistical pattern matching (however advanced or "funneled") and genuine reasoning, consciousness, or biological properties.

  • Acknowledge the categorical differences and the profound limitations of current "reasoning models."

  • Be transparent about the failures, costs, and impracticalities of hyped features.

  • Prioritize scientific integrity over narrative crafting.

As I've urged: "Let's study their actual computational properties, limitations, and fragilities with mathematical rigor... Let's avoid falling victim to speculation and marketing hype, and stay grounded in evidence."

Conclusion: A Crisis of Credibility Fueled by Detachment

Anthropic's embrace of the "reasoning model" charade, AI welfare fantasies, and biological fictions constitutes a crisis of credibility, fueled by a dangerous detachment from scientific reality. This isn't just philosophical meandering; it's a series of major failures in technical understanding and responsible communication regarding systems built on well-understood transformer and backpropagation principles.

What's needed is not just a clear-eyed assessment but an unflinching confrontation with the technical truth. Without it, these initiatives remain premature, misleading, and a distraction from the real, hard work of AI development. Anthropic, and any lab following this path, must return to scientific grounding and end the performative schpiel, or risk becoming a case study in how not to advance a powerful technology. The time for fictions built upon misrepresentations of existing technology is over; the demand is for technical honesty and demonstrable, not imagined, progress.

0
Subscribe to my newsletter

Read articles from Gerard Sans directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gerard Sans
Gerard Sans

I help developers succeed in Artificial Intelligence and Web3; Former AWS Amplify Developer Advocate. I am very excited about the future of the Web and JavaScript. Always happy Computer Science Engineer and humble Google Developer Expert. I love sharing my knowledge by speaking, training and writing about cool technologies. I love running communities and meetups such as Web3 London, GraphQL London, GraphQL San Francisco, mentoring students and giving back to the community.