Introduction

In our pre-event thesis, Embodied AI’s Real Race: From Hardware Parity to Data Monopoly, we argued that the defining bottleneck in robotics has shifted from hardware to data—and that the winners will be companies that vertically integrate three data streams: simulation, human video, and teleoperation ground truth. On February 7, 2026, AGI House hosted an intensive Autonomous Robot Build Day in San Francisco, bringing together founders, engineers, and researchers from eight robotics and AI companies for a day of structured prototyping, panel discussion, and technical interviews.

The event featured two main formats. First, an expert panel moderated live in front of the audience, featuring Tony Zhao (CEO of Sunday Robotics), Jason Ma (DYNA Robotics), Varun Ganapathi (Everest Robotics), and Ariyan Kabir (GrayMatter Robotics).

Second, we conducted extended one-on-one interviews with three company leaders: Henry, founding engineer and head of software at Sunday Robotics; Mustafa Bal, co-founder and CEO of Nomadic ML; and Xenia Kupriyanova, co-founder and CEO of Ultimate Fighting Bots. Alongside these, Mustafa from Nomadic ML and Ariyan from GrayMatter gave technical presentations on their platforms.

This memo uses those conversations and presentations to test our original thesis against what practitioners are actually building. It also expands significantly on the market landscape, mapping the major players across the full embodied AI stack globally.

Hardware Is Largely Solved, But Opportunities Remain

Physical AI systems today broadly fall into three categories: autonomous vehicles, industrial robots operating in structured factory environments, and humanoid robots designed for general-purpose task completion in uncertain settings. While the first two categories have established ecosystems and incumbents—industrial robotics, for instance, continues to be dominated by ABB, FANUC, Yaskawa, and KUKA—humanoid robotics is a greenfield market with no incumbents, nearly all entrants arriving within the last five to ten years. It is the humanoid category, with its promise of general-purpose physical intelligence, that represents both the largest long-term opportunity and the steepest remaining technical challenge.

A critical structural feature of the supply chain is that no major component supplier exists solely for humanoid robotics. The market is simply not large enough to justify dedicated product lines. Nearly all suppliers—whether Japanese actuator manufacturers, German sensor firms, or American compute providers—serve multiple end-use markets such as automotive and aerospace, and will only optimize products for robotics-specific use cases if demand warrants it. This means each robotics manufacturer is assembling a unique bill of materials from a fragmented landscape of cross-industry suppliers. If demand shifts in a supplier's core market, production volume for high-mix, low-volume robotics orders is deprioritized. The result is that vertical integration becomes a strategic response to supply chain fragility rather than a pure efficiency play.

However, determining which parts should be owned and integrated vertically versus bought commercial off-the-shelf (COTS) is still a decision done largely on a case-by-case basis. Sunday Robotics, for instance, designs custom teleoperation gloves in-house because its mission set of household chores sometimes requires sensory input beyond simple visual and depth data (e.g. in sock folding, much of the manipulation requires understanding of the blind-side of the sock, and tactile feedback when the end effectors are within the sock). As Tony Zhao put it during the panel, the decision on using COTS actually represents significant technical risk for physical AI companies as designing in-house hardware often commits significant engineering resources. He advised that this decision can best be reached by surveying whether the input/output of a given component is well-defined—a defined I/O will indicate that a component can likely be COTS. We see this as a key point where the market remains underdeveloped, as there is no consolidation and most robotics BOMs are highly diversified with vendors from all over the world. Particularly, we imagine a San Francisco-based factory that specializes in quickly delivering a full assortment of humanoid hardware, from high torque density motors to RGB-D cameras, would provide value in allowing more rapid iteration timelines.

Similarly, even form factor choices for humanoid robotics remain deeply consequential and are not converging. Sunday chose a wheeled base over legs, explicitly sacrificing stair-climbing capability for passive stability—a safety-first design choice driven by their consumer deployment target. Henry explained the logic directly: in the worst case, if all software fails, a wheeled robot's arm may drop but the platform will not topple onto a child or pet. GrayMatter, targeting industrial surface finishing on high-value parts like fighter jet components, operates fixed-arm systems that must be geometry-agnostic across arbitrary part shapes. Ultimate Fighting Bots fields bipedal humanoids optimized for dynamic, adversarial contact. 

Taken together, the evidence confirms that hardware—while not fully solved—is largely commercialized and no longer the binding constraint on physical AI adoption. But until that demand for specifically humanoids materializes, the software and data layer represents both the highest-leverage investment opportunity and the most tractable path to accelerating the market. We turn to that layer next.

Market Map: The Physical AI Software Stack

The embodied AI landscape can be organized into six functional layers. These are not a clean linear value chain; they form feedback loops, where data infrastructure companies enable better policies, which generate more deployment edge cases, which feed back into data platforms. Below we define each layer and map the major players globally, including the Chinese companies that now dominate global shipment volumes.

Vertically Integrated Humanoid OEMs

A handful of companies span the entire embodied AI stack—from hardware design through proprietary foundation models, simulation, data collection, and deployment. These vertically integrated players deserve separate treatment because their competitive advantage lies not in any single layer but in the closed loop between them: proprietary hardware generates proprietary data, which trains proprietary models, which deploy on proprietary hardware, which generates more data. This flywheel is the defining structural advantage in embodied AI.

Company Vertical Integration Profile
Figure AI $39B valuation (Sep 2025). Designs its own humanoid hardware (Figure 02/03), develops the Helix VLA model in-house—a dual-system architecture (System 2 for high-level planning at 7–9 Hz, System 1 for low-level control at 200 Hz) trained on ~500 hours of proprietary teleop data. Runs entirely on embedded GPUs. BotQ factory targets 12,000 units/year. Ended OpenAI partnership to vertically integrate robot AI. Deployed at BMW Spartanburg.
Tesla (Optimus) Leverages FSD neural network infrastructure and data pipeline. Unified World Simulator shared between autonomous driving and Optimus. Billions of miles of driving video transfer to embodied reasoning via shared sim. Proprietary hardware, data, models, and compute. Most vertically integrated player in the space by data volume.
1X Technologies NEO humanoid entering limited home testing. World model approach to learning physics from video. Internal data collection and training pipeline. $125M Series B at ~$1.2B valuation. Among the first to target direct-to-consumer home deployment.
AgiBot Global shipment leader with 5,100+ units in 2025. Develops in-house VLA models and trains on proprietary fleet data. Leverages Chinese auto supply chain for cost. RynnBrain platform for AI training.
Unitree Second-largest shipper globally (4,200 units). R1 humanoid, and G1 companion robot. Internal simulation and policy training. Dominates price-performance curve via vertically integrated manufacturing.
UBTECH Walker S2 deployed at Foxconn, BYD, Geely, FAW-VW. Internal AI stack spanning perception, planning, and control. Strong industrial deployment pipeline in Chinese automotive.

Below are our software layers. This segmentation and naming of these layers is heavily inspired by an analysis by AWS.

Layer 1: Connect, Digitize & Store Data

The foundation of any Physical AI system lies in capturing, digitizing, and structuring real-world information. IoT devices, sensors, cameras, LiDAR, and other hardware collect multimodal state data from physical environments—creating digital representations through 1D data streams, 2D images, 3D point clouds, and metadata from operational systems. For robotics specifically, this layer includes on-robot sensor suites, the communication infrastructure that moves data in real time, and the storage and indexing platforms that make it retrievable at scale. Physical AI systems require a dual-pathway data architecture: low-latency sensor data streams directly to edge ML models for immediate reactive control, while higher-level reasoning tasks leverage cloud-connected storage and enterprise system integration. The challenge is not just volume but heterogeneity — point clouds, video, force readings, and joint-state trajectories must be indexed, time-aligned, and retrievable for both real-time operations and long-term model training.

Company / Platform Role in This Layer
LiveKit Open-source real-time audio/video transport infrastructure. Powers voice in Grok and ChatGPT. LiveKit Cloud enables remote teleoperation from anywhere, making fleet-scale data collection location-independent. Critical enabler for the connect layer—without low-latency A/V transport, remote teleoperation data collection breaks down.
Rerun Open-source multimodal data stack for Physical AI ($17M seed, Mar 2025). Purpose-built database and cloud platform for ingesting, storing, and visualizing video streams, 3D scenes, and tensor data from robots, drones, and AVs. Built in Rust for high-performance rendering. Team includes the creator of the rosbag format and egui (largest Rust GUI framework). Open-source visualization toolkit already adopted by Meta, Google, Hugging Face, and Unitree.
Matterport 3D spatial capture platform. Creates digital twins of physical spaces using LiDAR and photogrammetry. Provides the kind of pre-mapped environmental data that simulation and navigation systems depend on.
Foxglove A leading robotics observability and data management platform ($40M Series B, Nov 2025). Unifies multimodal data collection, storage, indexing, and visualization into a single platform. The Foxglove Agent runs on-robot to manage data at the edge—handling uploads, retention policies, and device-level indexing. Used by NVIDIA, Amazon, Anduril, Shield AI, Dexterity.

Layer 2: Segment, Label & Understand Data

This layer transforms raw multimodal data into AI-ready training signals. It encompasses data manipulation (transformations, cleaning, temporal resampling), annotation and labeling of sensor streams, and—critically—automated curation that surfaces the high-value, novel, and failure-mode instances buried in terabytes of footage. This was one of the most important signals from the build day: the data bottleneck is not just about volume, it is about finding the signal in the noise. As the Nomadic ML CEO argued at the event, you need tools that can identify the unusual and high-value instances at superhuman speed and consistency.

Company / Platform Role in This Layer
Scale AI Largest general-purpose data labeling platform. Human-in-the-loop annotation serving major AV and robotics companies. Expanding into robotics-specific workflows. The incumbent in labeling but increasingly challenged by automated approaches from companies like Nomadic ML.
Labelbox Enterprise data labeling platform with $189M raised. Software-first approach giving teams direct control over annotation pipelines via AI-assisted tools, model evaluation, and integrated workforce management. Customers include Google Cloud. Differentiates through deep MLOps integration—designed to plug into existing training pipelines rather than operate as a standalone service.
SuperAnnotate AI-assisted annotation platform spanning image, video, text, audio, and 3D data. Runs a global network of domain-expert annotators ("SME Careers") for managed labeling services. Strong in semantic segmentation via its superpixel-based tooling. Differentiates by combining the platform with a built-in expert workforce—closer to Scale AI's full-service model but with more transparency and client control over the process.
Segments.ai Multi-sensor data labeling platform purpose-built for robotics and autonomous vehicles. Specializes in synchronized 3D point cloud and 2D image annotation with ML-assisted tooling (Superpixel 2.0, auto-segmentation). Used across AV, delivery robotics, agriculture, and underwater autonomy. Focused on the hard problem of multi-modal ground-truth generation that foundation models cannot yet fully automate.
Voxel51 Open-source visual AI data platform for exploring, curating, and evaluating computer vision datasets. FiftyOne lets teams visualize multimodal data (images, video, 3D), spot failure modes, weed out mislabeled data, and run model evaluation—all in one workspace. Used by LG Electronics, Berkshire Grey, RIOS Intelligent Machines.
Nomadic ML Value proposition is to provide Waymo-level video inference engine using ensembles of VLMs. Automates behavior triaging, failure-mode detection, edge-case surfacing, and compliance validation across operational footage. Serves Fortune 500 AV operators and robotics scale-ups (Bedrock, Zendar, 12 Labs). Replaces armies of human labelers with consistent, superhuman-speed analysis.

Layer 3: Simulation and Training

Simulation environments provide safe, controlled spaces for training autonomous systems without real-world risk. These capabilities span digital twins, synthetic data generation, physics-based world models, and the foundation model architectures (VLAs, diffusion policies) that translate sensory input into motor commands. The simulation layer ranges from basic digital representations to high-fidelity physics engines (MuJoCo, NVIDIA Isaac) to emerging world foundation models that learn physics from video. This layer also includes the VLA models and policy architectures that form the “brain” of the system—the robotics analog to LLMs in digital AI.

Company / Model Approach & Relevance
Simulation & World Models
NVIDIA Isaac Sim Promises 10,000+ concurrent sim environments on single GPU. Domain randomization for sim-to-real transfer. Isaac Lab built on NVIDIA Warp; compatible with MuJoCo. Targets 70x acceleration of robotics ML workloads.
NVIDIA Cosmos World foundation model generating physically plausible synthetic video for training. Bridges the gap between internet video pretraining and physics-grounded simulation.
Google DeepMind (MuJoCo) Open-source physics engine widely used in RL research. MuJoCo-Warp integration with NVIDIA for GPU-accelerated sim. Foundation for many academic and commercial robotics pipelines.
Foundation Models & Policy Architectures
Physical Intelligence (π0) >$5B valuation. π0 is a generalist VLA combining multi-task, multi-robot data with a new architecture enabling cross-embodiment transfer. Open-sourced π0 and π0-FAST (5x faster training via new action tokenizer). Backed by Bezos, OpenAI, Sequoia.
Skild AI $300M raised at $1.5B. Building a “general-purpose brain for robots” via massive-scale diverse sim training. Foundation model approach: one model, many embodiments. STN GPU-One partnership for compute.
NVIDIA GR00T N1 Open foundation model for humanoid reasoning and skill acquisition. Dual-system architecture (vision-language for reasoning, diffusion transformer for motor control). Trained on Isaac GR00T synthetic data.
Generalist AI (Gen‑AI) Building cross-domain embodied foundation models for both sim and real. Focus on generalization across robot morphologies.
Google DeepMind (RT-2, RT-H) Pioneered VLA approach with RT-2 (fusing vision-language pretraining with robotic control). RT-H for hierarchical task decomposition. Set the paradigm that most current VLA research follows.

The build day panel was nearly unanimous in skepticism toward near-term embodiment-agnostic generalism. The practical path today remains domain-specific policies deployed to specific form factors in specific environments, with generalization emerging gradually as architectures and data mature.

Layer 4: Deployment & Management of Autonomous Systems

Once trained and validated, AI models must be deployed to autonomous systems with robust fleet management capabilities. This layer handles over-the-air model updates, agent policy management, edge-cloud orchestration, and ongoing monitoring. The deployment phase requires careful consideration of edge computing constraints, network connectivity (or lack thereof), and security. Systems must operate reliably when disconnected from central management while still receiving updates and reporting telemetry. This layer also includes the edge processors that sit inside the robot and the cloud infrastructure optimized for physical AI workloads.

Platform Role
NVIDIA Jetson AGX Thor Blackwell-based edge AI module (128 GB memory, 2,070 FP4 TFLOPS, 130W). Adopted by Unitree, AgiBot, Galbot, EngineAI, UBTECH. Designed to run full VLA stack on-robot.
Nebius, STN, NVIDIA DGX Purpose-built physical AI cloud infrastructure. Nebius offering GPU-optimized clusters with high-throughput storage for sim workloads. STN providing GPU-One service to Skild AI. NVIDIA DGX/OVX for training and simulation.
Formant Cloud-based robot fleet management and data platform. Formant provides a "single pane of glass" to deploy, monitor, teleoperate, and analyze heterogeneous robot fleets. Processes 5B+ data points per month. Customers include John Deere (Blue River Technology), BP, Burro, Scythe Robotics, Knightscope, and SoftBank Robotics.
InOrbit AI AI-powered robot orchestration platform for managing heterogeneous robot fleets across enterprises. InOrbit positions itself as the enterprise operations layer—it bridges robots with business systems (WMS, ERP, WES) and orchestrates mixed fleets from different vendors through a single platform. Deploys across four continents. Customers include Colgate-Palmolive and Genentech (Roche). Backed by L'ATTITUDE Ventures and Globant Ventures.

Layer 5: Edge Inference & Application Domains

The final layer brings intelligence to the point of action. Edge-based inference enables real-time analysis and actuation without network dependency—critical for collision avoidance, balance control, and emergency stops where milliseconds matter. This layer is also where robots meet end users: the specific industries, environments, and tasks where embodied AI is being deployed today. Real deployments in 2025–2026 remain concentrated in logistics, automotive manufacturing, and controlled service environments. Consumer home deployment is just beginning.

Domain Key Players & Status
Logistics & Warehousing Most mature deployment domain. Includes Agility Digit at Amazon/GXO, UBTECH Walker S2 at Foxconn, and Figure at BMW. Tasks: tote handling, bin picking, material transport. Figure’s Helix demonstrated faster-than-human package sorting.
Automotive Manufacturing Apptronik Apollo at Mercedes-Benz; Figure 03 at BMW Spartanburg; UBTECH at BYD, Geely, FAW-VW. Tasks: assembly assist, parts handling, inspection.
Industrial (Skilled) GrayMatter for surface finishing (aerospace/defense). 90% of such tasks still done by hand. Targets high-risk, high-skill work where human error and injury are significant. Sim-to-real pipeline with field-deployed data collection on material domains.
Consumer Home Earliest stage. 1X NEO in limited home testing; Sunday Robotics beta end of 2026; Figure targeting select homes. Tasks: kitchen cleanup, tidying, laundry, organization.
Entertainment & Testing Includes UFB’s humanoid cage-fight league, AgiBot performances at China’s Spring Festival Gala, and Unitree concerts with pop stars.
Healthcare/Rehabilitation Fourier GR-2 purpose-built for patient support (50 kg payload). PAL Robotics for hospital logistics. Still largely pilot-stage.

Revising the Data Thesis

Building upon the discussions from the Autonomous Robot Build Day and our subsequent analysis, we are expanding the memo to address the emerging technical paradigms in World Action Models, the maturing data infrastructure stack, and the strategic maneuvers of the industry’s largest players.

World Action Models: Challenging the Teleoperation Moat

The recent release of the DreamZero paper, which introduces a 14-billion parameter World Action Model (WAM), represents a potential inflection point that could redefine the competitive landscape of physical AI. If the claims made in this research hold true, they suggest that training on synthetic data and diverse internet-scale video is significantly more effective than the industry previously imagined. This realization directly threatens the "data moat" strategy currently pursued by companies heavily invested in proprietary teleoperation and physical manufacturing. If the efficacy of World Action Models continues to improve at this pace, the traditional competitive advantages derived from high-fidelity human demonstrations and vertically integrated hardware-data loops may be rapidly eroded by players who can master "neural simulation" and zero-shot transfer.

During our Reading Room sessions, we explored the technical nuances of this shift, specifically why the co-training of world modeling and action prediction is a structural necessity rather than a mere design choice. Participants noted that models trained purely on action data tend to embed all their physical knowledge into a narrow, task-specific signal, leaving significant representational leverage on the table. Conversely, models trained purely on video often learn "hacky physics" where, for example, a model that does not understand collision dynamics might simply blend two objects together visually when they make contact. DreamZero addresses this by forcing a richer representation of the scene through co-training on both future world states and continuous actions. This approach forces "objectness" to become a first-class concept within the model's weights. While DreamZero provides compelling evidence for the performance gains of this architecture, our discussion highlighted a critical missing piece: the field still requires a loud and definitive ablation study that compares the same backbone and data with and without the world modeling loss to truly prove its dominance over simple scaling.

This paradigm shift also exposes the current frailty of robotics benchmarking. The Reading Room consensus was that existing evaluation methods are fundamentally broken; even within a single lab, running the same model against itself across different days can produce wildly variable results due to hardware wear, lighting conditions, and operator variables that are rarely modeled or reported. To move toward a more scientifically rigorous standard, the industry must adopt blind A/B testing as the default practice. Furthermore, the long-term answer lies in the creation of a "Robot Arena"—a portable, standardized benchmarking environment where any model can be evaluated across diverse environments. This would allow for zero-shot evaluation to become the baseline metric, where a model is placed in front of a robot it has never encountered to perform a task without any fine-tuning.

The transition to these 14-billion parameter models suggests that scale is beginning to settle technical arguments that small-scale experiments never could. In our discussions, we observed that much of the skepticism around cross-environment transfer and multi-task generalization originated from negative results observed at smaller scales. However, at the scale demonstrated by DreamZero, what previously appeared to be negative interference between tasks is transforming into constructive transfer. This is the same scaling dynamic that reshaped natural language processing, and it suggests that any research conclusions drawn from smaller models should be treated as outdated hypotheses rather than settled findings.

Teleoperation: The Capstone Is More Nuanced

We also found that teleoperation itself is evolving as a data axis. Henry from Sunday Robotics described a progression from in-house teleoperation to scaled demonstration collection. Sunday builds custom teleoperation gloves for high-quality data capture, then uses transformer and diffusion policy models to learn from the collected demonstrations. The key insight: teleoperation data quality matters more than quantity. Their approach emphasizes careful demonstration design over brute-force collection.

However, Mustafa from Nomadic ML pushed back on the framing that you simply need more data. His core argument was that most robotics companies are already sitting on terabytes of operational footage that they cannot efficiently extract value from. The real bottleneck may not be collection, but instead, curation: finding the 0.1% of frames that represent novel failure modes, edge cases, and high-value training signal. Nomadic’s VLM-based inference engine automates this triaging at superhuman speed and consistency.

Data Infrastructure As A Strategic Bottleneck

This shift toward massive, simulation-heavy models is occurring alongside a maturing infrastructure stack that is increasingly designed to handle the "data nervous system" of a global robot fleet. For a company like Figure AI, the stack is a vertically integrated engine that moves from high-fidelity teleoperation captured via tools like the Apple Vision Pro to massive-scale simulation in NVIDIA Isaac Sim and Cosmos. This process involves a critical curation layer where the 0.1% of high-value edge cases are extracted from terabytes of operational footage, a task currently being automated by platforms like Nomadic ML. The backend for this operation requires a specialized hardware-software hybrid: high-performance storage solutions like VAST Data are used to eliminate I/O bottlenecks for multimodal training, while Foxglove provides the observability needed to sync and debug petabytes of real-time sensor data.

NVIDIA’s strategic positioning within this stack is a direct evolution of its CUDA playbook. By open-sourcing foundational weights and models like GR00T and Cosmos, NVIDIA is effectively setting the industry standard for the "plumbing" of robotics. This approach lowers the entry barrier for developers while ensuring that the entire industry remains tethered to NVIDIA’s proprietary silicon and simulation ecosystems for deployment. Whether a startup builds a humanoid or a specialized industrial arm, the underlying intelligence is increasingly likely to run on NVIDIA Blackwell-powered Jetson Thor modules. This creates a scenario where NVIDIA captures the value of the robotics boom through a "compute tax," regardless of which individual robot OEM eventually wins the market.

Revised Theses

Thesis 1: The data pyramid is validated, but add a fourth layer. Our original three-layer model (simulation → human video → teleoperation) needs augmentation with a data intelligence layer—tools for automated curation, behavior triaging, and signal extraction from operational footage. Without this layer, the raw data flywheel stalls at fleet scale. Companies like Nomadic ML for automated curation, Foxglove for data management and observability, Segments.ai for multi-sensor annotation, and infrastructure providers like LiveKit enabling real-time data transport, are as critical to the stack as the companies generating the data itself.

Thesis 2: Vertical integration remains the winning strategy, but the axis of integration is shifting. Our original thesis emphasized companies that integrate across simulation, video, and teleoperation. The build day suggests the more important integration axis is data collection → data curation → model training → deployment → edge-case capture. Sunday’s trajectory—from building data infrastructure to optimizing consumer experience—illustrates this pipeline maturing in real time. At the global level, companies like Figure and Tesla that own the fleet, the data pipeline, and the model stack maintain structural advantages.

Thesis 3: Generalist embodied AI is a 3+ year horizon; the near-term value is in domain-specific deployment. The panel was nearly unanimous in its skepticism toward near-term embodiment-agnostic generalist models. The practical path is domain-specific policies deployed to specific form factors in specific environments—Sunday in homes, GrayMatter in aerospace factories, UFB in combat arenas—with generalization emerging gradually as data and model architectures mature. Startups should architect for depth in a single domain, not breadth across many.

Thesis 4: Safety and trust are product features, not compliance checkboxes. Sunday’s decision to forgo legs for passive stability, their layered software safety guards, and their beta deployment strategy (small group of enthusiastic early adopters providing feedback before mass production) reflect a product-level commitment to safety that goes beyond regulatory compliance. For consumer robotics, trust must be engineered into the hardware and the go-to-market, instead of as an engineering afterthought.

What We’re Watching

The Autonomous Robot Build Day confirmed our central claim—data, not hardware, is the binding constraint—while revealing that the data problem is richer than a simple scarcity framing. The challenge is not just collecting enough data, but building the infrastructure to extract signal from noise, close the loop between deployment failures and training improvements, and curate at the scale that fleet deployment demands.

We are now tracking five specific indicators: first, whether Figure’s Helix VLA and proprietary data flywheel delivers measurable performance advantages over companies using external foundation models; second, whether the data curation layer (Nomadic ML, Foxglove, Segments) proves to be a standalone business or gets absorbed into the vertically integrated players; third, whether China’s hardware cost advantage translates into an AI capability advantage as Chinese OEMs invest in their own VLA models; fourth, whether any company achieves a credible consumer home deployment before ~2027; and fifth, whether the sim-to-real gap narrows enough for simulation-heavy approaches (Skild, NVIDIA GR00T) to compete with teleoperation-heavy approaches (Sunday, Figure) in manipulation-heavy domains.

The embodied AI stack is crystallizing. The five-layer structure—from sensor digitization through data storage, labeling and curation, simulation and model training, fleet deployment, to edge inference—is becoming legible. The companies that master the transitions between these layers, especially the data-to-model and deployment-to-data feedback loops, will define the next era of physical AI.

Acknowledgments

Thanks to our panelists Tony Zhao, Jason Ma, Varun Ganapathi, and Ariyan Kabir, and to David Zhao from LiveKit for their contributions. Special thanks to Nomadic ML for sponsoring the hackathon and making their platform available to builders. This analysis reflects perspectives shared at the AGI House Autonomous Robot Build Day, February 7, 2026, supplemented by market research as of March 2026.