Blog Post

Last week on Feburary 18, AGI House hosted a dinner on frontier AI research.

Around the table were researchers who helped build early RL systems at major labs, architects of open-source model ecosystems, distributed systems pioneers, robotics builders, AI-for-science founders, professors, investors, and infrastructure operators. The conversation spanned robotics simulation, parallel intelligence, self-improving systems, Chinese hardware supply chains, coding agents, world models, and whether someone in the room might accidentally trigger a $100B public company collapse within two months.

Sebastian Thrun (Co-founder @ Waymo), Peter Welinder (VP @ OpenAI), Robert Nishihara (Co-founder @ AnyScale), Tim Shi (Co-founder @ Cresta), Thomas Kipf (Research Scientist @ Google DeepMind), Bonnie Li (Research Scientist@ Google DeepMind), Aristotelis Economides (MTS @ Google DeepMind), Joseph Spisak (Director @ Meta Superintelligence), Ankur Handa (Co-lead, Dex Team @ NVIDIA), Nazneen Rajani (CEO @ Collinear), Robert Yang (CEO @ Fundamental Research Labs), Carina Hong (CEO @ Axiom Math), Volodymyr Kuleshov (Co-founder @ Inception Labs), Joyee Wang (Researcher @ Stanford AI Lab), Jesus Lares (MTS @ AGI House), Kexin Huang (CEO @ Phylo Bio), Di Jin (Co-founder @ Eigen AI), Patrick McGovern (Chair @ McGovern Foundation), Yilong Qin (MTS @ OpenAI), Lavanya Shukla (Head of AI @ CoreWeave), David Cahn (Partner @ Sequoia Capital), Rocky Yu (Founder @ AGI House)

Below is a synthesis of the themes that emerged.

The Frontier Is Shifting: From Bigger Models to Better Systems

The conversation quickly moved beyond “just scale it.”

Yes, scaling laws still hold directionally. Yes, frontier training runs cost hundreds of millions of dollars. But the deeper questions now are:

How do we move from sequential intelligence to parallel intelligence?
How do we embed time, memory, and continual learning directly into architectures?
How do we scale systems, not just models?
How do we make AI operate robustly in the real world — not just benchmarks?

The tone wasn’t hype-driven. It was architectural.

Simulation Is About to Have Its LLM Moment

One of the strongest themes of the evening: we are dramatically underinvesting in simulation — especially for robotics.

Several participants argued:

Physics engines are underutilized.
Teleoperation doesn’t scale.
We’re still overly dependent on real-world data collection.

Prediction: Within 2–3 years, robotics simulation will hit a breakthrough moment similar to what LLMs experienced with text generation.

World models and video generation systems are approaching visual indistinguishability (e.g., Sora-class systems), but they are not yet physically consistent enough for training robot controllers. Temporal coherence and contact dynamics remain bottlenecks.

That said:

Within 1–2 years, training robot controllers primarily in simulation may become viable.
Robotics will achieve impressive pick-and-place capabilities within ~2 years.
The U.S. is underestimating China’s hardware and robotics supply chain advantage.

Intelligence: Sequential Today, Parallel Tomorrow

A provocative idea: today’s intelligence is fundamentally sequential — token by token, step by step.

But human cognition is massively parallel.

Diffusion-based language models generating ~1,000 tokens per second were cited as early signals of a shift away from strictly autoregressive thinking. The broader claim:

Intelligence will need to become parallel to break through current ceilings.

Related idea: Transformer architectures may not be sufficient for AGI without explicitly incorporating time and memory dynamics into their core design.

The Efficiency Gap Is Still Massive

One debate centered around computability and efficiency:

The estimated lifetime electricity cost of a human brain is roughly ~$500 at retail energy prices.
Frontier AI systems require training runs costing hundreds of millions to approach similar levels of competence in narrow domains.

Implication: There are still orders of magnitude of efficiency gains available.

Algorithmic improvements may ultimately matter more than raw compute expansion.

Coding Agents as a Path to AGI

A recurring thesis: intellectual work already happens on computers.

If AGI emerges first anywhere, it may not be in robotics — it may be in coding agents capable of autonomously:

Writing production software
Debugging large systems
Designing experiments
Improving their own infrastructure

Some argued that certain frontier systems are already better than any single engineer in constrained domains.

Others pushed back, pointing to brittleness, lack of generality, and robustness failures.

The table was split on whether “AGI” has already been achieved in narrow economic terms.

Evaluation Is the Quiet Bottleneck

Benchmarks are saturating. Public leaderboards are gamed. Real-world robustness remains unsolved.

Formal verification is gaining renewed attention in AI safety discussions. If models are writing critical infrastructure code, probabilistic correctness may not be enough.

A strong consensus emerged:

Evaluation may be the true bottleneck of 2026.

Robotics & Physical AI: The Infrastructure Layer Matters

Discussion shifted toward physical AI and supply chains:

Chinese robotics hardware development is accelerating rapidly.
U.S. investors are underestimating the scale and velocity of progress.
Supply chain and manufacturing infrastructure may be the real constraint on robotics scaling.

Waymo adoption was cited as evidence that society will accept AI in life-critical contexts faster than expected — once reliability crosses a certain threshold.

World Models & Video Generation

Video generation systems are crossing visual indistinguishability thresholds.

But visual realism ≠ physical consistency.

Key challenges remaining:

Temporal consistency over long rollouts
Contact dynamics modeling
Bridging visual world models with control policies
Using video models for training, not just evaluation

The optimistic view: 1–2 years to meaningful simulation-based robot controller training.

The cautious view: physics modeling remains underdeveloped relative to visual modeling.

Systems Over Single Models

There was broad agreement that AGI will likely emerge from systems, not a single monolithic model.

Important research directions raised:

Continual learning
Context engineering
Long-horizon memory management
Active exploration in RL vs passive token sampling
Multi-modal integration beyond vision
Distributed systems for heterogeneous AI workloads

One especially sharp idea:

Context engineering may become a bigger compute workload than training.

Inference-time compute — reasoning loops, tool calls, search trees — may rival or exceed training-time compute.

Economic Shockwaves

The conversation turned speculative at times.

Questions raised:

Will 50% of GDP be AI-generated by the end of 2027?
What does a post-AGI economy even look like?
What are the limits of human intelligence in comparison to synthetic systems?

One bold prediction:

Someone at this table will trigger a $100B public company loss within two months — via an AI-related incident that becomes major media news.

The point wasn’t recklessness. It was acknowledgment of the scale of leverage now in play.

🔥 Hot Takes from the Table

Here’s a distilled list of sharper claims and predictions from the evening:

We are underinvesting in simulation — robotics will have its LLM moment within 2–3 years.
Teleoperation doesn’t scale; physics engines are massively underutilized.
Transformer architectures alone cannot achieve AGI — time must be built into the architecture.
Intelligence today is sequential; the next leap requires parallelism.
Coding agents are the most likely path to AGI because intellectual work is already digital.
Some frontier models are already better than any single engineer in specific domains.
Context engineering may soon consume more compute than training.
Modeling raw sensory data may be “AGI complete” — and we’re underinvesting in it.
Formal verification will become central to AI safety.
China’s robotics supply chain advantage is underestimated in the U.S.
Efficiency gains available in AI are still orders of magnitude away from biological efficiency.
Experience may be inversely correlated with AI effectiveness — newer graduates sometimes adapt faster.
Scaling laws will continue — but will hit real-world interface limits.
Within 2 years, robotics pick-and-place will be impressively solved.
Within 1–2 years, simulation-trained robot controllers may become viable.
AI-for-math and AI-for-science applications are returning to prominence.
Someone will accidentally trigger a nine-figure market event through AI misuse or failure.

What Felt Different

Compared to similar conversations even a year ago, the tone has changed.

Less “AGI next quarter.”

Systems design
Infrastructure
Memory and time
Evaluation
Supply chains
Economic consequences

There was also an undercurrent of something else: inevitability.

Not hype. Not panic.

Just the sense that the slope is steepening.

Why We Host These Dinners

AGI House exists to create rooms where frontier builders can think together — candidly and off-stage.

No slides.

No recorded panels.

No performative optimism.

Just honest technical debate about what’s actually happening.

We’ll continue hosting these dinners regularly. If you’re working at the edge — in research, robotics, systems, infrastructure, safety, or applied AI — we’d love to have you at a future one.

The frontier isn’t a place.

It’s a conversation.

Please drop a note at: info@agihouse.org if you’re interested in sponsoring the next dinner.

‍

The Next Intelligence Breakthrough Won’t Look Like the Last