Unmasking the Myth of AI Agency Through NotebookML
In the rapidly evolving landscape of artificial intelligence, we often find ourselves captivated by the seemingly intelligent responses of large language models (LLMs). But beneath the surface of these sophisticated systems lies a fundamental truth: AI lacks true agency.
The Illusion of Self
AI agency is a seductive concept—the belief that these complex systems possess a genuine sense of self or identity. However, this is nothing more than an elaborate illusion. LLMs are, at their core, intricate next-token prediction engines, masterfully disguised through fine-tuning and conversational interfaces.
The Mechanics Behind the Mask
After pre-training, these models are essentially prediction machines. Fine-tuning provides a veneer of instruction-following and conversational ability, but it's merely a surface-level adjustment. Reinforcement learning from human feedback (RLHF) modifies the outermost layers of the model, creating the impression of understanding and agency.
A Practical Exploration: NotebookML
To understand these limitations, I turned to NotebookML, a popular tool for generating podcast scripts. This platform perfectly illustrates the challenges of creating multi-agent content when true agency is absent.
NotebookML, Audio Overview feature, transforms multiple source content into a discussion between two AI hosts who analyse, connect topics, and engage in casual banter. While technologically impressive, this automated podcast generation serves as an ideal case study for examining the boundaries of artificial agency.
The Challenge of Multiple Personas
When tasked with generating a script featuring two hosts, NotebookML reveals the fundamental weaknesses in AI's ability to maintain distinct identities. The AI struggles with what psychologists call "theory of mind"—the capacity to attribute mental states to different agents.
In my personal experience, I observed subtle yet telling inconsistencies: hosts would inadvertently adopt each other's personas, answer their own questions, or even misalign with the intended voice.
Listen below a podcast example, pay attention right at the start 0:12 and a bit later at 0:51, how Host-1 asks a question and then responds herself breaking the natural order twice:
Where AI Falls Short
The primary limitation stems from treating the entire script as a single, undifferentiated unit. Without the ability to maintain separate, coherent internal states, the AI frequently produces scripts with:
Role confusion
Inconsistent persona maintenance
Unexpected voice switching
A Path Forward: External Runtime Management
To address these challenges, I propose an innovative approach: an external runtime system dedicated to managing host states and interactions. This system would:
Maintain dynamic states for each host
Ensure consistent role adherence
Provide real-time interaction management
Allow scalable, complex conversation scenarios
How It Would Work
By introducing an API-driven external runtime, we can offload the complexity of state management from the AI itself. This system would track prior contributions, validate consistency, and dynamically adjust host interactions.
Conclusion: Recognizing the Limits
While AI continues to advance at a remarkable pace, we must remain clear-eyed about its current limitations. The appearance of agency is just that—an appearance. By understanding these constraints, we can develop more nuanced, effective approaches to AI-generated content.
The journey of AI is not about creating artificial beings with true self-awareness, but about developing increasingly sophisticated tools that augment human creativity and communication.
Subscribe to my newsletter
Read articles from Gerard Sans directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Gerard Sans
Gerard Sans
I help developers succeed in Artificial Intelligence and Web3; Former AWS Amplify Developer Advocate. I am very excited about the future of the Web and JavaScript. Always happy Computer Science Engineer and humble Google Developer Expert. I love sharing my knowledge by speaking, training and writing about cool technologies. I love running communities and meetups such as Web3 London, GraphQL London, GraphQL San Francisco, mentoring students and giving back to the community.