The Philosophical Gaps in LLMs: Wittgenstein's Warning and Chomsky's Challenge


Picture this: you ask a chatbot about the weather, and it replies, “I’m feeling a bit cloudy today, but the forecast is sunny!” It’s charming, witty, and… a little weird. Why is an AI “feeling” anything? This is no nascent personality peeking through—it’s a statistical artifact, a quirk of how large language models (LLMs) navigate the wild, wonderful maze of human language. LLMs have revolutionized how we interact with technology, generating poetry, answering questions, and even debating philosophy. But their successes come with gaps—glitches where meaning slips, words mislead, and machines sound suspiciously human. Let’s dive into the philosophical heart of these language-bound triumphs and traps, exploring why LLMs dazzle, why they stumble, and what their “human” voice really reveals.
The Language Illusion: Wittgenstein's Warning
When a chatbot responds, "I'm feeling thoughtful today," something philosophically interesting happens. It produces a grammatically perfect sentence that appears to express an inner state—yet no such state exists. This is precisely what Ludwig Wittgenstein warned us about: language can create powerful illusions of meaning where none exists.
Wittgenstein's philosophy emphasized that words gain significance through their use in specific contexts—his famous "language games"—rather than through some cosmic dictionary of meanings. Yet this flexibility creates a trap. We can construct sentences that follow all the rules of language while referring to nothing real.
LLMs fall directly into this trap. They've mastered the statistical patterns of human expression without the underlying reality those expressions normally reference. When an LLM writes about its "feelings," it employs language that appears to reference an inner emotional state. But there is no such state—only statistical patterns derived from human texts where "I feel" is frequently followed by emotion words.
The output maintains perfect grammatical structure and semantic coherence, giving the illusion of genuine expression. We are seduced by familiar linguistic forms into attributing understanding where there is only statistical mimicry—exactly the scenario Wittgenstein cautioned against.
Surface Without Structure: Chomsky's Challenge
Noam Chomsky revolutionized linguistics by proposing that humans possess innate linguistic structures—a "universal grammar"—that enable us to generate and understand novel, meaningful utterances. This goes beyond surface-level pattern recognition to deep generative rules that connect language to meaning.
LLMs fundamentally fail this Chomskyan test. Their transformer architecture—built on attention mechanisms and next-token prediction—excels at capturing superficial correlations between words but lacks the underlying structural framework Chomsky describes. They create what we might call a "patchwork intelligence": impressive surface mimicry driven by context-based statistical biases rather than principled understanding.
Consider how an LLM can generate syntactically perfect sentences containing logical impossibilities or factual contradictions. It has mastered the statistical distributions of language without grasping the recursive, hierarchical nature of human linguistic competence that Chomsky identifies as essential.
The architecture captures co-occurrences brilliantly but misses what Chomsky argues is the core of language: not just patterns of words, but structured rules for generating meaningful expressions. This is why an LLM might write beautifully about concepts it doesn't understand—it has learned the surface manifestations without the generative structure.
The Anthropomorphic Trap
These philosophical gaps create three significant practical problems:
Anthropomorphism: The language illusion Wittgenstein warned about leads users to attribute consciousness, intent, or beliefs to what are merely statistical artifacts. When an LLM says "I don't know" or "Let me think about that," these aren't admissions of ignorance or demonstrations of reasoning—they're high-probability responses in contexts where humans typically express uncertainty.
Hallucinations: Without grounding in truth conditions (Wittgenstein) or generative structures (Chomsky), LLMs confidently produce content disconnected from reality. The model has no mechanism to distinguish between linguistically probable statements and factually accurate ones.
Inconsistent reasoning: LLMs lack Chomsky's structured framework to maintain logical consistency across extended text. They follow local coherence patterns rather than global logical constraints, leading to contradictions they cannot detect.
The gap between linguistic performance and actual understanding creates a dangerous illusion—one that both Wittgenstein and Chomsky's philosophies help us identify and understand.
Architectural Limitations
The root causes of these philosophical failures lie in specific architectural choices:
Token prediction over meaning representation: LLMs predict the next token based on statistical patterns rather than building structured representations of meaning. This is a fundamental limitation that directly relates to Wittgenstein's concern about language without reference.
Attention mechanism bias: The attention mechanism excels at capturing correlations between tokens but struggles with causal or logical relations—precisely the deeper structures Chomsky argues are essential to language.
Generation/elimination dualism: LLMs simultaneously generate content and filter it, creating tension between creativity and accuracy. This split approach lacks the unified generative framework Chomsky describes in human language.
Missing linguistic universals: The purely statistical approach misses the structural universals that Chomsky argues underlie all human languages, leading to a system that can mimic but not truly generate language.
Potential Solutions
Addressing these philosophical gaps requires architectural innovation that engages directly with the insights of Wittgenstein and Chomsky:
Structured knowledge integration: Incorporate explicit knowledge structures that represent Chomskyan principles of language organization, moving beyond pure statistical correlations.
Grounding mechanisms: Connect language to external reality through multimodal learning or causal models, addressing Wittgenstein's concern about language detached from truth conditions.
Hierarchical processing: Move beyond flat attention to hierarchical structures that better capture the recursive nature of language Chomsky emphasizes.
Explicit reasoning layers: Add components specifically designed for logical consistency and inference to address the contradiction problems that current statistical approaches cannot solve.
The path forward isn't abandoning the statistical power of LLMs but complementing it with structured approaches that address the philosophical insights of Wittgenstein and Chomsky. Understanding these philosophical limitations isn't merely academic—it provides a roadmap for developing systems that move beyond mimicry toward genuine understanding.
Subscribe to my newsletter
Read articles from Gerard Sans directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Gerard Sans
Gerard Sans
I help developers succeed in Artificial Intelligence and Web3; Former AWS Amplify Developer Advocate. I am very excited about the future of the Web and JavaScript. Always happy Computer Science Engineer and humble Google Developer Expert. I love sharing my knowledge by speaking, training and writing about cool technologies. I love running communities and meetups such as Web3 London, GraphQL London, GraphQL San Francisco, mentoring students and giving back to the community.