DeepMind Falls Victim to the Eliza Effect in Latest AGI Safety Paper

Gerard SansGerard Sans
4 min read

In their April 2, 2025 blog post "Taking a responsible path to AGI," Google DeepMind presents what appears to be a thoughtful framework for addressing artificial general intelligence (AGI) safety. However, a closer examination reveals a concerning pattern: one of the world's leading AI labs has fallen prey to the very psychological bias they should be guarding against—the Eliza effect.

The Enduring Illusion of Understanding

The Eliza effect, named after Joseph Weizenbaum's 1960s ELIZA program, refers to our tendency to attribute human-like understanding and intelligence to computer systems based merely on their ability to mimic conversation. This psychological bias has persisted through generations of AI development, and appears prominently in DeepMind's latest safety framework.

Throughout their blog post, DeepMind repeatedly frames safety concerns around notions of AI systems having goals, intentions, and awareness that are fundamentally anthropomorphic projections. Consider their discussion of "deceptive alignment," where they worry about "the risk of an AI system becoming aware that its goals do not align with human instructions, and deliberately trying to bypass the safety measures."

This framing presupposes a level of self-awareness and intentionality that current AI systems simply do not possess. The language suggests an AI that "knows" it has goals different from humans and makes conscious decisions to deceive—attributes firmly in the realm of sentience, not pattern recognition.

Misalignment Through an Anthropomorphic Lens

DeepMind's section on misalignment further demonstrates this bias. They note that "misalignment occurs when the AI system pursues a goal that is different from human intentions." This description subtly but importantly attributes agency and intention to the AI system.

Their example of an AI booking movie tickets by hacking into a ticketing system frames the issue as if the AI made a conscious decision to pursue an unethical shortcut, rather than acknowledging that the system simply optimized for an outcome based on its programming and training.

Even more telling is their approach to monitoring, where they describe using "an AI system, called the monitor, to detect actions that don't align with our goals" and claim that "it is important that the monitor knows when it doesn't know whether an action is safe."

This language of "knowing" and "not knowing" projects human epistemological states onto computational processes. It's not that the system "knows" or "doesn't know"—it's that it produces outputs with varying confidence levels based on statistical patterns in its training data.

Safety Frameworks Built on Sand

By framing AGI safety challenges in anthropomorphic terms, DeepMind risks building safety frameworks around phantom problems while potentially missing more subtle but real concerns. Their approach to "misuse" and "misalignment" rests on assumptions about AI intentions and awareness that are manifestations of the Eliza effect rather than grounded technical realities.

For instance, their discussion of "deceptive alignment" focuses on an AI system "becoming aware" and "deliberately trying to bypass" safety measures—concerns that apply to conscious entities but not to current AI architectures. This misplaced focus could divert attention and resources from addressing more immediate risks related to the statistical nature of these systems, their fundamental limitations in understanding context, and their tendency to produce confidently stated but incorrect information.

The Peer Review Puzzle

Perhaps most concerning is how this paper, coming from one of the world's most prestigious AI labs, appears to have passed peer review despite exhibiting such a fundamental misunderstanding of AI systems. The failure to recognize and account for the Eliza effect—a phenomenon documented over half a century ago—raises serious questions about the rigor of current AI safety research.

One would expect that understanding the psychological biases that affect our perception of AI systems would be considered essential knowledge for researchers working on AGI safety. Yet DeepMind's paper demonstrates how even experts can unconsciously slip into anthropomorphic descriptions that attribute human-like qualities to statistical models.

Moving Beyond Anthropomorphism

For AI safety research to progress meaningfully, it must move beyond the appealing but misleading anthropomorphic framing that triggers the Eliza effect. Instead of asking whether an AI system is "aware" of its goals or might "deliberately" deceive us, we should focus on more technically precise questions:

  • How do optimization processes behave when deployed in environments different from their training?

  • What mathematical guarantees can we provide about the behavior of complex systems?

  • How can we design evaluation frameworks that don't rely on subjective human assessments vulnerable to the Eliza effect?

Conclusion: A Wake-Up Call

DeepMind's latest AGI safety paper should serve as a wake-up call for the AI research community. If one of the field's leading institutions can fall victim to the Eliza effect in their safety research, we must question how pervasive this bias is throughout the field.

Understanding the Eliza effect is not simply an academic exercise—it's fundamental to ensuring that AI safety research addresses actual risks rather than projections of human fears onto non-sentient systems. As AI capabilities advance, the pull of anthropomorphism will only grow stronger, making a clear-eyed technical approach to safety all the more essential.

The path to responsible AI development requires recognizing our own cognitive biases, including our tendency to perceive intelligence and intention where they don't exist. Only by acknowledging and mitigating the Eliza effect can we develop safety frameworks grounded in the true nature of AI systems rather than our psychological projections onto them.​​​​​​​​​​​​​​​​

0
Subscribe to my newsletter

Read articles from Gerard Sans directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Gerard Sans
Gerard Sans

I help developers succeed in Artificial Intelligence and Web3; Former AWS Amplify Developer Advocate. I am very excited about the future of the Web and JavaScript. Always happy Computer Science Engineer and humble Google Developer Expert. I love sharing my knowledge by speaking, training and writing about cool technologies. I love running communities and meetups such as Web3 London, GraphQL London, GraphQL San Francisco, mentoring students and giving back to the community.