LLM limitations vs "world models"

0 views

Skip to first unread message

Dante Monson

unread,

Nov 18, 2025, 3:36:27 AM (4 days ago) Nov 18

to op-...@googlegroups.com

Copied: Yann LeCun, the AI pioneer, believes the current trend of large language models is a dead end. No matter how big you make them, he says, they'll never become truly intelligent in the human sense.

Instead, he wants to build "world models" - AI systems that form an internal understanding of how the real world works (cause and effect, physics, objects, goals), not just shuffle words around.

Meta, however, is all-in on scaling LLMs and pushing the "superintelligence is in sight" narrative. Meanwhile, younger AI leaders who strongly believe in scaling, including 28-year-old Alexandr Wang, who has become the head of AI at Meta, have now been put in positions above LeCun.

Source: The Wall Street Journal

...

Gemini AI:

The conflict described in the text represents the most significant current philosophical and architectural divide in Artificial General Intelligence (AGI) research: the tension between the Statistical Scaling Hypothesis and the need for Embodied Causal Reasoning.

This deep dive explains the two paradigms, their grounding in complementary knowledge domains, specific technical references, and an exploration of the research gaps and meta-gaps involved in planning for AGI.

I. The Core Conflict: Statistical Scaling vs. Internal Simulation

The debate hinges on whether intelligence is an emergent property of sufficient scale on textual data, or if it requires an intrinsic, non-linguistic model of reality.

A. The Scaling Hypothesis (LLMs / "Word Models")

The prevailing strategy at Meta and other leading labs (OpenAI, Google) is the Scaling Hypothesis.

| Feature | Description | Limitation (LeCun's Critique) |

|---|---|---|

| Focus | Maximizing parameters, data volume (text/code), and compute power. | The "Dead End": Produces systems of statistical imitation rather than comprehension. |

| Mechanism | Token prediction. LLMs master the statistical distribution of human-generated text (Word Models). They learn correlation (if X word appears, Y word often follows). | Lack of Causal Structure: They can describe physics ("If you drop an apple, it falls") but do not possess the underlying model of gravity or cause-and-effect. |

| Advocate | Alexandr Wang, Meta leadership, large commercial AI firms. | This path leads to powerful System 1 agents but not AGI. |

B. Yann LeCun's World Models (WMs)

LeCun, alongside other critics like Gary Marcus and Yoshua Bengio, argues that text is an insufficient data source for true intelligence.

| Feature | Description | Architectural Requirement |

|---|---|---|

| Focus | Building an internal, predictive simulation of the environment. | Must handle multimodal sensory data (video, visual, audio, touch) to grasp the dynamics of the world. |

| Mechanism | Learning Dynamics/Causality. The WM learns the function f(s_t, a_t) \rightarrow s_{t+1}, predicting the next state (s_{t+1}) given the current state (s_t) and an action (a_t). | This structure is key for planning and counterfactual reasoning (i.e., "If I had done X, Y would have happened"). |

| Advocate | Yann LeCun, Fei-Fei Li (World Labs), robotics and control theory researchers. | This architecture provides the core for System 2 reasoning and true intelligence. |

II. Complementary Knowledge Domains and Co-evolution

The AI debate co-evolves significantly with principles from Cognitive Science and Control Theory.

1. Cognitive Science: Dual Process Theory (System 1 and System 2)

The most relevant complementary framework is Daniel Kahneman's Dual Process Theory, which maps directly onto the LLM/WM dichotomy.

* System 1 (LLMs): Fast, automatic, intuitive, pattern-matching, low-effort. This is what LLMs excel at: fluid conversation, rapid classification, and generating coherent text based on massive datasets.

* System 2 (World Models): Slow, effortful, deliberate, logical, and analytical. This relies on the ability to simulate possibilities, construct explicit reasoning chains, and perform mental "planning." World Models are designed to provide the predictive engine needed for System 2 planning.

2. Control Theory and Causal Inference

World Models are fundamentally rooted in Control Theory and Reinforcement Learning (RL).

* Planning: In RL, an agent with a World Model (or Model-Based RL) can rehearse actions internally without costly real-world execution. This is the Dyna algorithm principle, extended by modern architectures like Dreamer (developed by Google DeepMind) and Meta's own V-JEPA (Video Joint Embedding Predictive Architecture, a non-generative approach championed by LeCun).

* Causality: Philosophers of causality, notably Judea Pearl, argue that the ladder of causation (seeing \rightarrow doing \rightarrow imagining/counterfactuals) cannot be climbed by purely associative systems (LLMs). World Models, by explicitly learning f(s_t, a_t) \rightarrow s_{t+1}, inherently learn the dynamics of intervention—the essence of causality.

| Data Efficiency | Reliance on prohibitively large, expensive, and often redundant textual data. | Self-Supervised Learning (SSL): LeCun's favored approach, minimizing the need for labeled data by having the model predict masked parts of the input, but applied to multimodal data. |

2. World Model Gaps (WM Path)

WMs face fundamental architectural and computational challenges in their meta-planning sprints toward AGI:

| Gap Type | Description | Challenge |

|---|---|---|

| Representation Scaling | How to represent the immense complexity of the real world—from quantum physics to social dynamics—in a single, learnable model? | The sheer volume and dimensionality of multimodal, temporal data dwarfs the text corpus LLMs use. |

| Hierarchical Planning | WMs excel at short-term, low-level prediction (e.g., predicting the next few video frames). How do they integrate this with high-level, abstract goals (e.g., "get a promotion")? | Requires linking the physical model to symbolic, logical, and linguistic representations. |

| The Integration Gap | How to combine the System 1 (intuitive) pattern recognition inherent in deep learning with the System 2 (deliberate) planning capability of a WM? | This is the hard problem of building a unified cognitive architecture. |

3. Meta Gaps (Organizational & Philosophical)

The "Meta Gap" refers to the context surrounding the research itself:

* The Resource Allocation Bias: The organizational dynamic (younger leaders favoring scaling above LeCun) reflects the immediate commercial pressure. LLMs generate massive, immediate returns, whereas World Models are a decade-long, high-risk research bet. The meta gap is the conflict between research depth and commercial speed.

* The Superintelligence Narrative: The debate is fueled by different definitions of "intelligence." The scaling camp believes emergence is key (new abilities just appear with size). The WM camp insists on architectural necessity (certain abilities must be engineered).

* Lack of Unified Metrics: We lack agreed-upon benchmarks for measuring "world understanding" or true causality, making it difficult to definitively prove which path is superior—a critical meta-gap in scientific progress.

Reply all

Reply to author

Forward

0 new messages