Blog Post

‍AGI Dinner Series | June 11, 2026

Last week, AGI House hosted another edition of our Frontier AI dinner series: a Jeffersonian-style conversation over omakase with researchers and builders from Google DeepMind, xAI, OpenAI, Thinking Machines Lab, Meta Superintelligence Labs, 1X, and a handful of founders working on everything from generative operating systems to AI-native investing.

Per our house rules, ideas are shareable but speakers are not identified. What follows is a summary of what was said, not who said it. As always, the goal was simple: learn something and make friends. We did both.

The loudest ideas in AI are not always the most important ones.

In an opening round of “overhyped or underhyped,” a group of researchers and builders pushed back on several of the industry’s dominant narratives. Long-running autonomous agents? Overhyped. Fear-driven speculation about existential risk? Also overhyped, according to several people in the room.

What deserves more attention is less cinematic and more immediate: fast models, richer interaction, better knowledge transfer, broader access to intelligence, and systems that recognize the limits of what they know.

Taken together, the conversation suggested a different way to understand the next phase of AI. Progress may depend less on building a single model that thinks forever and more on building systems that respond quickly, learn actively, specialize intelligently, and fit naturally into human feedback loops.

Speed Is a Capability

Model quality is usually discussed in terms of benchmark scores, reasoning depth, or parameter count. But users experience intelligence through latency.

Time to first token is especially important. A model that begins responding immediately feels more useful, more conversational, and easier to steer. That responsiveness creates a tighter feedback loop: users can react, correct, and refine their intent before the system travels too far in the wrong direction.

This is why very fast, low-latency models came up repeatedly as underhyped. Speed is not merely an infrastructure optimization. It changes the interface and, with it, the range of tasks a model can handle well.

The discussion of Gemini 3.5 Flash illustrated the point. According to participants, its performance gains came primarily from post-training rather than a new pretrained base, including more reinforcement-learning compute, more environments, and new techniques for coding and agentic tasks. Serving the model on Google’s fastest inference TPU line reportedly helped reduce time to first token.

The broader lesson is that model development now involves a cost-latency-performance frontier. Effort controls, whether labeled low, medium, high, or extra high, give users a way to decide how much time and compute a task deserves. Not every prompt needs maximum reasoning. Many benefit more from an immediate, good-enough answer.

That does not make thinking free. A model forced to answer before it has reasoned may produce an error and then visibly correct itself. But allocating effort based on prompt difficulty appears increasingly practical. The challenge is not to maximize thinking in every case. It is to spend it where it matters.

The Interface Is Becoming Continuous

Most current AI products still inherit the turn-taking structure of chat: a person sends a message, the model replies, and the cycle repeats. Human communication is much less orderly.

We interrupt. We hesitate. We gesture. We change direction mid-sentence. Tone, timing, silence, and facial expression all carry information. That makes audio, nonverbal communication, real-time intelligence, and human-computer interaction some of the most promising underexplored areas in AI.

The technical debate is whether this kind of interaction must be trained natively into a model or can be assembled around a capable general-purpose model.

One view is that full-duplex interaction must be architectural. Real-time interaction data is unusually scarce, and asynchronous internet text is an imperfect substitute for live social behavior. Chunking interaction into intervals of roughly 180 milliseconds, near a human perceptual threshold, offers one possible unit for streaming multimodal systems.

The counterargument is that the internet already contains vast amounts of implicit interaction. Posts, replies, edits, and conversations may provide much of the necessary structure, leaving alignment and product design as the main gaps rather than model architecture.

A practical answer may combine both approaches: a fast model handles immediate feedback while a slower, more capable model works on longer-horizon reasoning. This layered design would mirror human interaction, where rapid reactions and deliberate thought operate at different speeds.

It also raises a basic interface question. In a continuous system with no clean turns, what counts as a prompt? The answer is not yet obvious, and current evaluation methods are poorly equipped to measure it. Full-duplex AI will need new benchmarks for interruption, responsiveness, social timing, and recovery, not just answer correctness.

Bigger Is Not Always Better

The industry’s public story often points toward one increasingly general model that can do everything. Its own product decisions tell a more complicated story.

Specialized models can be faster, cheaper, and better within a defined domain. Training distribution matters here: performance depends not only on the number of examples in a domain, but on their share of the overall token mix. Increasing emphasis on coding, for example, can trade off against performance elsewhere.

That makes specialization more than a temporary compromise. It may be the durable structure of the market.

The likely end state resembles cloud computing: a small number of companies provide enormous compute and general-purpose intelligence, while a much larger ecosystem builds focused models and applications on top. Even frontier labs reinforce this possibility when they release domain-specific variants alongside their flagship generalists.

The same logic applies to robotics. An embodied system does not operate at a single frequency. Balance and motor reflexes may require updates at 50 to 100 hertz. Reactive behavior and gesture response may run at 5 to 30 hertz. Language commands and long-horizon planning may need only 0.5 to 5 hertz.

Those requirements imply a layered architecture. Early general-purpose robots will probably use distinct systems for reflexes, reactions, and planning rather than one model controlling everything at every timescale.

This framing also challenges the idea that clumsy robots simply need more parameters. Often, the real problem is inadequate time spent characterizing the physical system. Better demonstrations require patient engineering, not just a larger model rushed onto a stage.

Agents Need to Teach, Not Just Search

Research agents were another recurring example of an underhyped opportunity that has not yet fulfilled its promise.

Deep research systems made information gathering feel dramatically easier. But collecting sources is only half the job. A long report can move information from the internet into a document without moving much of it into the user’s mind.

The next generation of research agents should optimize for knowledge transfer. That could mean adapting explanations to a user’s background, checking understanding, surfacing uncertainty, comparing competing interpretations, and returning to weak spots over time. A genuinely useful research agent would not simply deliver findings. It would help the user build a durable mental model.

The same idea extends to education. If AI can compress the time required to reach competence, then preserving curricula designed around four-year institutions makes little sense. The transformative question is not whether students can use AI to complete existing assignments faster. It is whether courses, credentials, and learning sequences can be redesigned around a much shorter path to mastery.

Access matters just as much. The value of intelligence will be limited if its best interfaces, languages, pricing, and infrastructure serve only wealthy users in the United States. Expanding access across regions and populations may prove more consequential than another incremental gain at the top of a benchmark.

The Limits of Autonomy

Several ideas labeled “overhyped” shared a common assumption: that progress means giving a model a goal, letting it run for a very long time, and expecting useful results at the end.

Long-running autonomy remains brittle. Agentic harnesses can help, but elaborate scaffolding may become less important as models learn to create their own workflows from a few examples. The economically meaningful impact of advanced AI is also likely to vary sharply by domain. Even a highly capable general intelligence may be too expensive, too slow, or too disconnected from the physical world to replace cheap human labor everywhere.

Active learning offers a more grounded direction. Today’s models are often poor at recognizing what they do not know, and they do not reliably direct resources toward closing those gaps. Systems that can identify uncertainty, seek the right evidence, and choose when to ask for help may be more valuable than agents that merely run longer.

What Comes Next

The most important frontiers may now sit between the familiar categories.

Cybersecurity will test models as both offensive and defensive tools, likely creating a volatile period before institutions and protections catch up. Visual interactivity could bring language-model flexibility into video, allowing people to manipulate generated scenes directly rather than regenerate them through text. Real-time interaction data could improve social and conversational systems in ways static internet corpora cannot.

But the strongest theme from the discussion was simpler: the next leap in AI may be less about raw intelligence than about how intelligence meets people.

Faster feedback. Better timing. More honest uncertainty. Deeper learning. Wider access. Interfaces that understand more than words.

These advances may sound modest beside the promise of AGI. In practice, they are the ones most likely to determine whether AI becomes genuinely useful.

The Last Word

The dinner’s quiet through-line was a kind of optimism that does not always make headlines: the intelligence is already here. What remains is the unglamorous, deeply human work of distribution, trust, latency, evaluation, and deployment into industries shaped nothing like AI. Or as one guest put it: the models are powerful enough that the bottleneck is now everything around them.

That, and the people. The AI being built is remarkable, but the room full of people building it, arguing about micro-turns over uni and comparing notes across labs that compete by day, is the part no benchmark captures.

The AGI House Dinner Series continues. If you are building at the frontier and want a seat at the table, you know where to find us.

‍

AGI Dinner Series: What’s Actually Underhyped