Opinion: LLMs will never be AGI, but DRL might be

I've been thinking about this a lot lately. LLMs have dominated AI conversations for years now. GPT-3 and similar models show remarkable capabilities. But I don't think they'll ever achieve AGI.

The path to AGI is probably closer to Deep Reinforcement Learning, or maybe something even more fundamental.

The Fundamental Limitation

LLMs are prediction machines. They're really good at pattern matching and generating coherent text. But they lack three critical things:

Goal-directed behavior: They respond to prompts, they don't pursue objectives
Self-directed learning: They can't improve themselves through interaction
Embodied understanding: They have no concept of cause and effect

Think about it this way: an LLM can tell you how to ride a bike. But it will never actually learn to ride a bike.

LLMs Are Imitating Ghosts

Here's my metaphor for LLMs: they're imitating ghosts.

They learn from what has already happened, repetitive learning from historical data. Imagine having a system that learned every true and false condition of the universe from text alone. That's an LLM. It remembers everything documented.

But remembering is not understanding. Imitating is not creating.

DRL: Mathematical Formulas That Adapt to Physics

DRL is fundamentally different. It's like a mathematical formula that adjusts itself to the physics of reality.

DRL agents iterate in reality (or simulation), discover underlying physics, and create novel strategies never seen in training data. AlphaGo didn't learn by reading millions of game commentaries. It learned by playing, discovering moves humans never conceived in thousands of years of Go history.

That's not ghost imitation. That's genuine discovery.

When a DRL agent learns to walk, it's not pattern-matching against videos of walking. It's discovering the physics of balance, momentum, and force through interaction with the environment.

The Evidence

Look at where the genuine breakthroughs came from:

AlphaGo and AlphaZero mastered Go, Chess, and Shogi through self-play
OpenAI Five learned complex team coordination in Dota 2
MuZero learned game rules and strategies simultaneously

These systems understood their environments and developed novel strategies. They didn't just memorize patterns.

The Critical Question: What Is the Right Reward?

But DRL has its own fundamental problem. What is the right reward function for AGI?

Think about how we became intelligent. It took billions of years of evolution. Evolution didn't optimize for "intelligence," it optimized for survival and reproduction. Intelligence emerged as a byproduct.

So the question is: what's the right reward for DRL to develop general intelligence? Is it even possible to hand-craft such a reward?

Does Backpropagation Really Work?

Here's the thing. Current DRL relies on backpropagation. But evolution didn't use backprop. Biological neurons don't compute gradients. Natural intelligence emerged from evolutionary pressure, not mathematical optimization.

Maybe AGI requires populations of models evolving over generations, where intelligence emerges from survival pressure in complex environments. Not from a single model learning through backprop.

My Idea: The Simulated Earth Experiment

Build 10 million robots living in a boxed, simulated Earth.

Let them survive, kill (compete for resources), mate, and evolve. New generations are byproducts of reproduction, inheriting traits from parents. Observe their behavior for years.

The key questions:

Will they become as intelligent as animals? Will they develop communication, social structures, tools?
When will they reach ape or human-level intelligence? What environmental pressures trigger abstract reasoning?

Why this could work: This mirrors actual intelligence emergence—natural selection (not hand-crafted rewards), sexual reproduction (genetic recombination), generational learning, environmental complexity, and social dynamics.

We wouldn't be training a model. We'd be simulating evolution itself.

The timeline: If we simulate millions of years in compute time, when do we see animal-level intelligence? Ape-level? Human-level? We don't know. That's exactly why we need to try.

Conclusion

LLMs imitate ghosts of the past. DRL discovers formulas through interaction. But maybe true AGI requires something even more fundamental: evolutionary pressure over generations in complex environments.

DRL points us in the right direction, but AGI might require evolution, not just optimization.

The question isn't just "which approach?" It's "are we willing to wait for intelligence to emerge naturally, rather than forcing it through training?"

Should we build the simulated Earth experiment? Let's discuss on Twitter or via email.