Blog Post

Eleven days after Google I/O 2026, AGI House and the Google DeepMind team put a room full of builders in front of a stack that had been public for less than two weeks — Gemini 3.5 Flash, Antigravity 2.0, the Managed Agents API, Gemini Omni Flash, and Science Skills — with the engineers who built it on the floor. The premise was deliberately narrow: ship something real on the new tools while the people who made them stand next to you. By the time demos started at 8pm, three tracks had produced working agents for how software gets built, how content gets made, and how science gets done.‍

See our technical pre-event memo.

What made the day worth being in the room was the range of people in it. The morning put the researchers pushing the capability frontier on the same stage as the leaders deploying AI inside regulated, high-stakes industries — and across the panels, the one-on-one conversations, and the demos, the discussion kept returning to the physical world: science, medicine, biology, and what it means to build all of this for human benefit.

The morning ran three sessions back to back — a fireside with Sergey Brin (moderated by AGI House's Rocky Yu), a Google DeepMind panel with Shane Gu, Jay Whang, and Benoit Schillings (moderated by Jesus Lares), and an enterprise panel with AI leaders from GSK, Gilead, Eli Lilly, and Capital One (moderated by Nina Liu). Among the guests we hosted were Teresa Nguyen, the Stanford physician-scientist building AI and robotics for pediatric medicine, and Eric Nguyen, founder of Radical Numerics and part of the team behind Evo, the model that learned to read and write DNA — both of whom we sat down with one-on-one, along with Benoit.

Rocky Yu (Founder of AGI House) and Sergey Brin (Co-Founder of Google)

The frontier

After Sergey Brin's opening fireside — where one of the questions put to him was what, and how, he'd build Google if he were starting out in his twenties today — the DeepMind panel turned to where the research itself is heading.

Benoit Schillings, VP of Research at DeepMind and former CTO of Google X

Benoit Schillings — VP of Research at DeepMind, formerly CTO of Google X — described innovation as close to a search problem: kissing frogs to find a prince, where the only real levers are a better pre-filter for which frogs to kiss and the speed at which you can kiss them. The cultural corollary is to fail fast and make failure safe. "If you do not allow people to fail, they will not try hard enough," he said — a Formula One driver who never crashes isn't pushing hard enough. He keeps that posture personal, too; in our conversation he described deliberately changing roles every few years by asking "what's the next very scary job I could take?" On code, he was blunt — "code writing is done" — and located the live frontier in the layers above it: software engineering as the management of complexity, and architecture above that. The bet he's most excited about is a riskier one: that reasoning doesn't have to happen in words. Humans, he points out, think first and explain in words second; the thinking itself is something else, and his team has spent the last year trying to understand and reproduce it — a roughly one-in-five shot he believes will pay off.

Shane Gu — whose work helped introduce step-by-step reasoning in LLMs and who now co-leads DeepMind's OmniThinking effort — framed the move from research to frontier deployment as a change in the objective function: away from novelty and toward impact divided by complexity, the least new machinery for the most impact. He has spent years on self-improvement (his paper "Large Language Models Can Self-Improve" predates the current wave) and pointed to where the signal is richest now — verifiable domains like coding, and reasoning that isn't confined to text, as in a recent paper of his, "Video Models are Zero-Shot Reasoners."

Jay Whang — who works on the Omni line and Nano Banana — made the case for world models as the substrate for the next jump. A strong enough world model, he argued, is tied to genuine world knowledge and reasoning, and could eventually stand in for parts of the test bed that natural science runs on — simulating enough of physics to accelerate the loop rather than only generating video.

The thread under all three: the frontier is getting less about producing text and more about reasoning, simulating, and acting in the world. Benoit put the stakes plainly when we asked what's non-negotiable for AGI — "empathy for humans," and a deep sense of what is actually good for them.

Drawing the line

It is rare to get four AI leaders from regulated industries this candid on one stage. The enterprise panel — Kim Branson (GSK), Patrick Loerch (Gilead), Gregg Spivey (Eli Lilly), and Ankur Prasad (Capital One) — spent most of its time on a single question: where to draw the line between automated and human decisions when the cost of being wrong is high.

Gregg Spivey described TuneLab, Lilly's effort to take models that normally only the largest pharma companies hold — small-molecule ADMET predictors, the "is this a viable drug" workhorses — and make them free to smaller biotechs, using federated learning so no one ever sees anyone else's data. (A nice historical note: much of the founding work on federated learning came out of Google in 2016.)

Kim Branson, who leads AI/ML at GSK, described rolling an AI-scientist program out across the company, and — separately — the Bepirovirsen story: using machine learning to identify the patients for whom a chronic hepatitis B drug could deliver a functional cure, in what he half-jokingly called the world's most expensive clinical trial. His advice for a regulated setting was to engage regulators early, because "it's not adversarial" — they want to help you do new things safely.

Patrick Loerch, who leads an 1,100-person clinical data science organization at Gilead, offered the cleanest framing of the day: he's less interested in autonomous AI than collaborative AI. "Drug discovery and development is a team sport," he said; the win is AI as a team member that surfaces insight across pathology, pharmacology, and the clinic — not an autonomous decision-maker dropped into a GxP environment.

Ankur Prasad of Capital One brought the asymmetric-risk lens from finance: a model that's 95% accurate can still beat a 75% human and be the wrong thing to fully automate, because the 5% can fail in catastrophic ways a human error wouldn't. So human-in-the-loop stays — and in a regulated shop, a regulator can point at a single line of code and ask what it does, which means auto-generated code still has to be explainable.

Kim Branson (GSK), Patrick Loerch (Gilead), Gregg Spivey (Eli Lilly), and Ankur Prasad (Capital One)

‍

What are we building for?

The two guests we spent the most time with one-on-one both work where AI meets the physical world — and both kept circling the same question.

Teresa Nguyen trained as a medicinal chemist at Genentech, where she patented a series of chronic-pain drugs, before going to medical school and becoming an anesthesiologist. Her research now brings low-cost, open-source robots to hospitalized children: kids on cardiac bypass who hadn't wanted to get out of bed were getting up to dance with robot dogs and wheeling their IV poles around the ward — and, it turned out, mobilizing more and needing fewer anxiety and depression medications. Her warning to builders is that values aren't neutral: "with every innovation, the inventor's values are instilled into the invention." She also sees an enormous, underused dataset hiding in plain sight — the operating room, the most data-rich physiological environment we have. Map heartbeat-to-heartbeat OR data onto wearables, she argues, and your watch becomes "an ICU monitor on your wrist." Her non-negotiable for AGI is that it should encourage humans to think for themselves — and her question for the engineer vibe-coding at 2am is simply: what am I building for?

Eric Nguyen, founder of Radical Numerics, comes at the physical world from the other direction — the genome. His team built Evo, a model that reads and writes DNA, which was used to generate a complete bacteriophage genome from scratch. His thesis is that biology is inherently multimodal: DNA, RNA, and proteins are different "languages" of one system, and the field has held itself back by specializing in them one at a time instead of modeling the whole. He also punctures a common myth — that biology is data-poor. "There is more DNA online than all of the text on the internet," he points out; the human genome alone is three billion letters, roughly thirty thousand books. Having spent time on DeepMind's Co-Scientist team, he's bullish on the field's turn toward science, with one insistence: capability and biosecurity are inseparable. If you can rewrite the fabric of biology, you build the safeguards in the same breath.

The build

After lunch, the Deepmind team ran hands-on workshops, builders did lightning pitches, and it was off to the races. Builders had until the evening demos to ship across three tracks — agentic coding and builder tools, multimodal and creator tools, and AI for science — on the just-released stack. Judges picked a winner per track; because the science track produced two unusually strong projects, we split it into two awards.

ZeroG (Agentic Coding, $3,000) attacks the cold-start tax: every Antigravity session starts from zero, so every engineer's agent rediscovers the same Google Cloud deploy paths — the same gcloud calls, IAM settings, and Cloud Functions errors. ZeroG is a shared-memory layer that records what one Antigravity agent did — the tool sequence and the outcome — so the next agent on a similar task inherits the pattern instead of rediscovering it. One agent learns; the team inherits the trace. It plugs into Antigravity 2.0 through a SKILL.md and an MCP proxy, with a graph network learning tool-transition patterns from real runs.

xDots (Multimodal & Creator Tools, $3,000) reframes creation as a sequence of decisions rather than prompt-wrangling, through a four-step workflow — Spark, Story, Scene, Screen. Under the hood it orchestrates several models, with Gemini driving the storyboarding and Veo generating the video. The demo made the point: a single sentence ("Sergey, what would you build if you were in 20s now?") plus four photos shot at the hackathon went in, and a storyboarded short film came out in just a few minutes. (The result is worth watching.)

GPCRclaw (AI for Science, $2,000) is a self-improving agent for protein design aimed at GPCRs — the receptor family behind roughly a third of marketed drugs, and one of the hardest to target. Built on Gemini and orchestrating specialized protein models like RFantibody and ESMFold2, it explores the biology of three real human targets, generates candidate nanobody binders, scores them in silico, and improves its own design strategy across rounds rather than producing one-off candidates. The team included the Head of Science at AWS HIL.

LabSpine (AI for Science, $2,000) is a concurrent agent swarm for designing lab-automation workcells, built on the Managed Agents API and Gemini 3.5 Flash, with Google AI Studio and Google Cloud (Compute Engine and Monitoring) underneath. Its trick is composition: parallel agents propose, validate, and repair designs against one shared model and rebase their work so improvements compound instead of clobbering each other — and the output isn't a sketch but a runnable Opentrons protocol (checked against the Opentrons simulator), deck diagram, bill of materials, and diagnostics.

What it added up to

For a day, the people compressing the loop between idea and capability were in the same building as the people who have to decide what's safe to ship into a clinic or a bank — and the guests who insist the whole point is human health and human agency. The most interesting work, in the panels and on the demo stage alike, lived where those concerns meet. That intersection is where AGI House likes to convene: frontier labs on one side, the builders and operators who put the work into the world on the other.

Thanks to the Google DeepMind team for building the day with us — and to everyone who shipped something real on tools that were barely two weeks old.

‍

11 Days After I/O: Inside the Google DeepMind Enterprise Build Day

The frontier

Drawing the line

What are we building for?

The build

What it added up to

The Autoresearch Loop: A Research Brief

AGI Dinner Series: The Physical AI Deployment Frontier