Blog Post

‍AGI House Dinner Series, in partnership with Imagine.io | April 23, 2026

20+ robotics founders, researchers, and engineers. Jeffersonian-style dinner. One conversation. One question: Why does every robotics demo look incredible and every deployment story sound like a war journal?

Marco Pavone (Professor @ Stanford / Sr. Director of Physical AI @ NVIDIA) ·Jesse Levinson (CTO @ Zoox) · Stuart Bowers (Sr. Director @ Google DeepMind) · Mengyuan Yan (MTS @ OpenAI) · Siddharth Saha (MTS @ Tesla Optimus) · Samir Menon (CEO @ Dexterity) · Saman Farid (CEO @ Formic) · Keven Wang (CEO @ UnitX) · Jason Ma (Cofounder @ Dyna Robotics) · Sampriti Bhattacharyya (CEO @ Navier) · David Chen (GM Robotics @ LiveKit) · Alex Bergman (Cofounder @ Rhode AI) · Bingfeng Xia (MTS @ AWS) · Varun Ganapathi (CEO @ Everest Robots) · Surbhi Rathore (Invoca / ex-Symbl.ai) · Bill Sun (CEO @ GALPHA.AI) · Joe Wang (CEO @ JW Horizon) · Davis Warnock (Investor) · Preet Singh & Rachit Nanda (CEO @ Imagine.io, sponsor) · Rocky Yu (CEO @ AGI House, host)

‍

#1: "No Amount of AI Will Fix a 55-Pound Box"

Three deployment killers — none of them what AI Twitter talks about.

Reliability: Factories need 99.9% uptime. Your GPU system has to survive copper dust that shorts circuit boards. One founder with 1,000+ deployed vision systems — 90% of his customers' first-ever GPU in production — lived in a factory for three months to understand the deployment surface.‍

Integration: Factories run on PLCs — 50-year-old programmable logic controllers. Nine major vendors, zero standardization. Making AI talk to a 1990s PLC is a completely different problem from making your model work.

Humans: AI vision systems more accurate than human inspectors don't just catch defects — they expose that the manufacturing process is broken. Parts humans approved for years get rejected. The factory's reaction isn't gratitude. It's denial. "The machine uncovers the ugly truth about manufacturing."

And underneath it all: physics. "We have robots that can only pick up 50 pounds. Customer calls Wednesday, new order is 55 pounds. No amount of AI solves that." Flexibility has a weight limit.

#2: "Do 97 Out of 100 Things Right and You Have Absolutely Nothing"

Three companies with real deployment numbers grounded the conversation.

Towel folding: 20–30 seconds per towel. 1,000+ per robot per day. 24/7. 100% daily success rate across gyms, hotels, salons. Key insight: towel folding works because the world converged on a standard before the robot showed up. Napkins? Every restaurant wants them different. That kills generalization.

Factory vision: 1,000+ systems inspecting battery cells. The founder moved into a factory for three months. "Deployment is arguably the most important milestone — and the most underappreciated."

Robo-taxis: 12 years in. 400,000 customers. 100 robots. ~100,000 miles/week. Just quadrupled the San Francisco geofence to 11 square miles. Still $0 revenue. Miles-per-stuck-event went from 300 three years ago to roughly once per 100,000 miles now — two orders of magnitude through pure grinding. One more order of magnitude is achievable before you hit blown tires and acts of God.

The double-collision story: a robo-taxi got rear-ended, passed remote diagnostics, was sent home autonomously — and got hit again three minutes later by a different driver. Everything that can fail, will fail.

#3: "Teleoperation Dies at 80%"

One claim landed like a grenade: "Robotics doesn't have a model problem. It has a real-world data problem." Everyone agreed data is the bottleneck. The fight was over how to get it.

An 18-year veteran's framework: teleoperation works from 0 to 80% and then fails. "I've never seen an application where it was useful beyond 80%." To push to 99.99%, you need the system finding its own edge cases — and those get exponentially rarer. His approach: find rare failures, amplify surgically with simulation. "Just throwing more data adds bulk where you don't care."

A professor offered the framework the room kept referencing all night: simulation serves testing, training, and validation — each with radically different fidelity requirements. "The Mars rover sky crane was only tested in simulation. That worked." But for training, models latch onto artifacts. For validation, you need absolute accuracy. "Generating corner cases is easy. Generating plausible, informative corner cases — that's very complicated."

One team's radical alternative: skip robot data entirely. Train on internet-scale video to learn physics, convert to real-time robot trajectories. A year ago this seemed implausible. They've now demonstrated it. Jury's still out on whether it scales.

#4: "Tactile Sensing Is Always 10 Years Away"

Brutal verdict. A Stanford faculty member's long-running joke got quoted: "Tactile sensing is always 10 years away." One team tested commercial tactile sensors and found few meet their published specs.

The deeper problem: force and friction simulation can't model contact physics at the fidelity needed for sim-to-real transfer in manipulation. The legged locomotion community found a clever hack — penalize the algorithm for behaviors the simulator is bad at (foot slip, high jerk). Modern legged stacks use 7–12 reward shaping functions to bridge the gap. Nobody has cracked the equivalent trick for dexterous hands.

#5: "Towel Folding Is the Coding Agent of Robotics"

The night's most debated analogy: Anthropic stumbled into coding as its killer vertical and used that flywheel for everything else. Could robotics have an equivalent moment?

One founder nominated towel folding — seriously. 24/7 hospitality deployment data feeds back into a general-purpose model. "Anthropic went down coding, now they're using it to solve everything else. Robots will follow a similar path."

The pushback was fierce. A researcher: "Anthropic's coding success required a strong base model first. In 2012 a coding agent wouldn't have worked." A veteran roboticist: "For 20 years I've seen the same toy examples. Clean table, nice light, everything arranged. Move the table 5cm — it doesn't work." Without agreed-upon evaluation sets, there's no way to even measure general progress.

Near-consensus: the moment won't come from a task (towels) but a category (manufacturing, logistics, hospitality) with enough market size to fund the flywheel and enough diversity to force generalization. Nobody agreed on which category.

#6: "Humanoids Aren't the Only Game in Town"

For humanoids: maximum generalization in a human-designed world. Handles, stairs, doorways — the world was built for human bodies.

Against: complexity, cost, fragility. When a humanoid trips in someone's home, who picks it up? For any specific task, a simpler specialized machine will outperform it.

The real variable is utilization. Even a $20,000 general-purpose robot can't sit idle. It needs high utilization on one task or deployment across many. That's a narrow economic band. Both form factors will coexist — the question is where the boundaries fall.

#7: "The FSD Moment Already Happened — In Defense"

Stop asking when robotics hits its FSD moment. It already happened.

A reported 24,000% increase in autonomous systems spending. $75 billion for autonomy in the 2027 U.S. defense budget. Aerial drones and surface vessels eating most of it. "A $13 billion aircraft carrier can be threatened by thousand-dollar drones."

One company at the table: 5,000+ operational hours on autonomous combat vessels. 2,000 nautical miles range at 20 knots (5x comparable vessels). Unmanned Atlantic crossing planned this year. Sub-millimeter precision for drone takedowns. All on edge compute — no cloud, GPS-jammed environments.

The tech stack? Mostly traditional autonomous systems. "Running a robot with a gun using a VLA seems like a bad idea." But when stakes are existential and budgets are unlimited, deployment accelerates by orders of magnitude.

#8: "The Most Underrated Strategy in Robotics Is Cuteness"

Nobody expected this one. A veteran of multiple major robotics programs argued the fastest path to useful data isn't factories or roads — it's cute robots that do nothing useful.

"Roomba achieved large-scale home deployment but wasn't sophisticated enough to get data back." The pitch: deploy small, safe, sensor-rich companion robots into homes. No manipulation. Just interactions, navigation, and learning what "good" looks like. "Get a large volume of robots that aren't trying to move things. You'll collect data faster than trying to deploy robots that do."

He teaches a Stanford class where students build cute robot dogs for children's hospitals. Sounds frivolous. The data strategy behind it isn't.

The Amazon Astro story sealed it. His wife loved the robot — it greeted her at 5:30am during medical residency. One fall day, the fireplace auto-ignited, blinded the Astro's sensors, and it walked into the fire and melted its own face. "It's still in our house because my wife won't throw it away." Every operating domain is infuriatingly difficult — including a living room on the first day of fall.
‍

The AGI Dinner Series brings together founders, researchers, and operators for candid conversations at the frontier of AI. Chatham House Rules — ideas shared freely, no specific attribution. Hosted by Rocky Yu at AGI House, in partnership with Preet Singh and Rachit Nanda at Imagine.io.

Please drop a note at: info@agihouse.org if you’re interested in sponsoring the next dinner.

‍

Physical AI: From Demo to Deployment — The Uncomfortable Truths