AI × Healthcare: A Technical Market Map from the Build Floor

Frontier labs have spent the last 18 months turning general-purpose models into healthcare-capable systems. GPT-5.x now exceeds human baselines across clinical roles measured in major evaluations and has reached near-ceiling performance on medical benchmarks—including USMLE-level scores well above passing thresholds—closing much of the "clinical reasoning" gap that held back earlier generations.

These launches validate that frontier-grade multimodal reasoning is now available off the shelf for healthcare workloads: GPT-5.x and Claude Opus handle long-context clinical documentation, patient conversations, and trial analytics, while NVIDIA-class models specialize in imaging and simulation. The open question—and the upside for new companies—is who will own the vertical control planes that sit on top of these models: EHR-embedded copilots, imaging operating layers, RPM risk engines, and payer-provider negotiation systems that transform raw model capability into measurable reductions in cost, burnout, and trial timelines.

Our thesis centers on three compounding wedges:

EHRs / Admin / Payer-Provider AI AI's fastest ROI is in paperwork: automating notes, prior auth, coding, and revenue-cycle tasks so clinicians recover hours per day. The winners will be deeply embedded "administrative OS" layers inside EHRs that own workflows, generate rich structured exhaust (notes, codes, appeals), and provide the governance needed for hospitals and payers to safely offload routine decisions.

Preclinical + Clinical Applications Foundation models are becoming reasoning engines over biology and patient journeys, not just single-task predictors. In preclinical work they shrink the search space for experiments; in the clinic they power imaging copilots and trial-design tools that triage scans, spot longitudinal patterns, and design tighter studies—improving as they ingest more wet-lab results, trial endpoints, and real-world outcomes.

Wearables & Remote Patient Monitoring (RPM) Continuous sensing shifts care from episodic visits to 24/7 risk management, but only if AI can turn noisy streams into patient-specific risk scores and concise clinician-ready summaries. Lasting value will accrue to RPM platforms that plug cleanly into EHR workflows, hit clinical-grade accuracy in prospective studies, and run efficient multimodal models near the edge so commodity devices become trusted medical infrastructure.

Our vantage point is uniquely grounded. In January 2025, AGI House co-hosted an intensive healthcare AI build day in Hillsborough, CA alongside Mithrl, bringing together researchers, founders, and technical operators for seven hours of structured prototyping across four tracks: agentic platforms for compound discovery, biomedical data access, administrative automation, and open research.

EHRs / Admin / Payer-Provider AI

Administrative burden consumes more than half of clinician time and over $1T annually in U.S. healthcare costs. AI's fastest ROI is automating documentation, prior authorization, coding, and revenue cycle—not by replacing workflows but by embedding deeply as an "administrative OS" layer inside EHRs. Winners will own the structured data exhaust (notes, codes, appeals) and governance rails needed for hospitals and payers to safely offload routine decisions.

The key insight: this isn't about better chatbots. It's about control planes that sit between clinicians and administrative systems, generating billing-grade outputs while maintaining audit trails for compliance.

Market Evidence

Documentation Automation Ambient clinical documentation has emerged as the breakout category. Early adopters report major reductions in after-hours charting time. Microsoft/Nuance's DAX Copilot is now embedded in Epic and processing encounters across hundreds of health systems, while startups like Abridge (now deployed in 150+ health systems) have seen explosive growth. In head-to-head evaluations, Cleveland Clinic found meaningful quality differences between ambient AI vendors—suggesting differentiation exists beyond commoditized transcription.

Prior Authorization Prior auth consumes double-digit hours per physician per week on average. Cohere Health reports the vast majority of requests auto-approved through AI, with remaining cases averaging days rather than weeks for turnaround. The CMS Interoperability and Prior Authorization rule is forcing health plans to adopt standardized electronic workflows by 2026, creating regulatory tailwinds for automation platforms.

Revenue Cycle Medical coding AI is achieving high accuracy on E&M codes and surgical procedures. AI-assisted denial management is showing substantially higher overturn rates. Health systems are recovering tens of millions annually through coding optimization. The emerging play: ambient documentation that captures billing-grade information at the point of care, eliminating the back-and-forth between clinicians and revenue cycle teams.

Technical Architecture (Summary)

The model stack that matters:

Foundation layer: Multimodal clinical LLMs that handle voice (ambient listening), text (EHR context), and structured data (labs, vitals) simultaneously. Models need long context windows to ingest full patient histories. Players are building on frontier models (GPT-4 class, Claude) with domain-specific fine-tuning, though some (Abridge, Ambience) are developing proprietary models.

Specialized task heads: ICD-10/CPT coding requires hybrid approaches—LLMs alone hallucinate rare codes. Successful systems combine retrieval (RAG over code databases) with classification heads trained on diagnosis-procedure relationships.

Real-time infrastructure: Ambient documentation requires sub-second latency from speech to structured note. This demands streaming ASR, incremental LLM processing, and edge-cloud hybrid architectures.

Integration layer: The real moat is EHR integration. Bidirectional FHIR APIs for reading patient context and writing back notes/orders. Epic and Oracle dominate; their APIs are proprietary, slow-moving, and expensive to access. Middleware players (Redox, Health Gorilla) are becoming "Plaid for healthcare," abstracting EHR complexity.

Key Players

Company Focus Notes
Nuance (Microsoft) DAX Copilot ambient documentation First-mover, deep Epic integration, expanding to nursing workflows
Abridge Ambient clinical documentation 150+ health systems, expanding into prior auth and revenue cycle
Ambience Healthcare AI operating system for documentation, coding, CDI Won Cleveland Clinic head-to-head; strong inpatient/ED focus
Suki AI voice assistant for clinicians Hundreds of health systems, Zoom partnership for platform distribution
Cohere Health Prior authorization automation Millions of PA requests/year, high auto-approval rates, expanding into care management
Nabla Ambient copilot for practitioners GPT-4 based, FHIR-native outputs, European expansion
Notable Health End-to-end workflow automation Prior auth + coding + scheduling

Unsolved Challenges

Technical

  • Hallucination risk in high-stakes outputs: LLMs confidently generate plausible but incorrect codes or diagnoses. No robust uncertainty quantification exists for medical LLMs.
  • Latency vs. accuracy trade-offs: Real-time ambient documentation requires fast response, but most accurate models are slow. Quality degradation with compression remains significant.
  • Context window limits: Full patient histories can exceed current model windows; attention degradation and "lost in the middle" issues persist.
  • Model drift: Medical coding changes annually, guidelines evolve, payer policies shift. Most deployed models are static and require manual retraining.

Integration & Workflow

  • EHR vendor lock-in: Epic and Oracle dominate with proprietary APIs. No "write once, deploy everywhere" exists for healthcare AI.
  • Clinician trust: "I can write the note faster than I can edit the AI's version" mentality persists. UX must make review faster than from-scratch writing.
  • Medicolegal clarity: Unclear who's liable for AI-assisted documentation—still evolving.

Business Model

  • Reimbursement uncertainty: Unclear who pays—hospitals (cost savings) or payers (better coding → higher revenue)? No consensus on value-based contracting for AI efficiency gains.
  • FDA oversight ambiguity: Documentation/coding AI currently exempt as "clinical decision support," but line is blurring as models become more autonomous.

Next Steps & Big Bets

Infrastructure maturation: FHIR API adoption accelerating under CMS pressure. Middleware consolidation (Redox, Health Gorilla) abstracting EHR complexity. Native AI EHRs (Canvas Medical) could disrupt if they achieve critical mass.

Platform expansion: Leaders expanding beyond documentation into revenue cycle, prior auth, and clinical decision support—building full "administrative OS" plays rather than point solutions.

The big bets:

  • Administrative OS platforms that embed deeply in EHRs and own the structured data layer will capture disproportionate value (Microsoft/Nuance positioning here; Abridge and Ambience pursuing similar strategies)
  • Vertical specialists (prior auth, coding) may win near-term but face commoditization as foundation models improve and horizontal players expand
  • Federated data collaboratives enabling multi-institutional AI training will become critical infrastructure

Key question: Who will own the control plane between clinicians and administrative systems? Winners will combine foundation model access, deep EHR integration, compliance infrastructure, and continuous learning—a full-stack play, not just a model wrapper.

Clinical Applications

Medical Imaging

AI as Triage

The foundational use case: FMs trained on image + metadata now sit upstream of radiologist worklists and decide which studies are urgent enough to interrupt the current reading queue. Stroke, PE, and ICH detection are the canonical examples, but the next evolution is scoring every study on clinical urgency plus predicted resource demand (e.g., will this patient need the IR suite, ICU admission, cath lab?), creating a dynamic queue where the AI continuously reorders cases based on real-time triage signals and downstream availability.

Viz.ai's evolution from stroke-only alerts to multi-pathway cardiovascular coordination is an early glimpse of this architecture: the product is less "stroke model" and more routing fabric linking scanners, call teams, and cath labs. Cases bubble up or down the list in real time as new studies arrive and downstream capacity shifts. Triage logic is no longer hard-coded; it is learned from historical throughput, escalation patterns, and outcomes, analogous to reinforcement-learning-based schedulers.

From Apps to an AI Control Plane

Multimodal imaging AI is shifting from "one-off detectors" to operating-layer systems that control which exams get done, how radiologists read them, and how money flows through the department. The investable wedge is now whoever owns the orchestration, reporting, and reimbursement rails that sit on top of foundation models.

Winners will likely be those that: (1) embed deeply in radiology IT as AI operating layers, (2) convert images directly into billing-grade structured and narrative output, and (3) tie their economics to throughput, revenue, and guideline adherence rather than AUROC.

Radiology has more than a thousand FDA-cleared AI devices, yet only a handful are used daily because each lives as a separate pop-up, viewer, or workflow fragment. RSNA 2025 content and vendor roadmaps now frame multimodal AI as an operating layer that ingests images, priors, metadata, and EHR context to drive scheduling, protocoling, triage, visualization, and reporting across the same backbone.

The emerging architecture:

  • Unified multimodal foundation model backbone: Foundation models trained on tens of millions of patient journeys now underlie platforms like Aidoc's CARE model and vendor-neutral "AI OS" stacks, allowing dozens of indications to be spun up from shared representations.
  • Orchestration-first design: Health-system CIOs increasingly evaluate AI on its ability to orchestrate routing, escalation, and notifications across PACS/RIS/EHR rather than on any single algorithm's standalone accuracy.

Essentially, medical imaging AI is converging on a "control plane" abstraction, where the defensible asset is the orchestration and monitoring layer that sees every study and every downstream action, not the individual detector.

Structured Second Sight: AI as Co-Readers

Rather than replacing radiologists, co-reader systems assume the human stays in the loop and instead optimize for complementary vision. Studies at RSNA 2025 show combined human + AI reading improves sensitivity for cancers and vascular events, particularly in high-volume settings like lung screening and ED CT.

Modern co-readers look very different from the old red-circle overlays:

  • Vision-language backbones surface candidate findings with rationales—bounding boxes tied to textual descriptors and guideline snippets—rather than raw heatmaps.
  • Systems track which flags radiologists accept, modify, or reject, generating a continuous feedback stream that tunes thresholds by site, subspecialty, and individual reader.

Instead of opaque scores, the model surfaces candidate findings with bounding boxes, textual descriptors, and links to guideline statements (e.g., Fleischner, BI-RADS) retrieved via RAG, so radiologists see both where and why the model is concerned.

This is where Rad AI and similar platforms become interesting: the same FM that spots a nodule also drafts the corresponding sentence and suggests follow-up, so accepting a suggestion updates both the image-side and language-side behavior.

The "v0 Moment" in Radiology: AI Automation

Radiologists spend a surprising amount of time not "seeing" but documenting: measuring, comparing to priors, filling structured fields, and rewriting nearly identical impressions. Imaging FMs are beginning to eat this entire surround.

Industry analyses around RSNA 2025 converge on one clear signal: AI-assisted reporting is the first use case with both mature technology and real pull from practices. Companies like Rad AI, Sirona Medical, and others now deploy systems that draft impressions, structure measurements, and harmonize language for large groups of radiologists, cutting dictation time while tightening consistency.

This architecture turns automation into a multi-head FM problem rather than a zoo of narrow models: one backbone, many lightweight heads, all trained or fine-tuned jointly. Radiologists can nudge style and thresholds, but do not have to micromanage dozens of separate algorithms.

Research Acceleration

Foundation models are turning every imaging archive into a self-supervised research dataset.

Native vision-language radiology FMs are now trained on tens of millions of images plus paired reports, enabling zero-shot labeling, retrospective cohort discovery, and cross-study pattern mining that would have been infeasible with manual curation. Emerging multimodal models link imaging with clinical notes, labs, and genetics, supporting tasks like prognostic risk scoring, virtual control arms, and imaging-based phenotypes—even before explicit "drug discovery" or "clinical trial" workflows are layered on top.

The non-obvious implication: the radiology department becomes the query interface for hospital-scale observational research, with FMs providing programmable access to longitudinal image-derived phenotypes rather than just reports.

Challenges

Reimbursement: Radiology remains a paradox: over a thousand AI devices cleared, but effectively one mature Category I CPT code for imaging AI (HeartFlow) and a small set of NTAP or Category III carve-outs. Most tools are 510(k)-cleared without prospective outcome data, forcing hospitals to justify purchases via efficiency gains or quality narratives rather than direct revenue. Next-step opportunity: "billing-native" imaging FMs that emit structured codes, supporting measurements, and denial-resistant justifications as first-class outputs—turning radiology AI from a cost center into a revenue-cycle lever.

Workflow integration: Radiologists live in PACS viewers and dictation/reporting software; every separate AI window adds friction and "app fatigue." Market data from KLAS and RSNA 2025 emphasizes that tools embedded directly into existing viewers and reporting UIs see an order-of-magnitude higher sustained use than standalone consoles. This pushes the ecosystem toward unified AI layers where findings appear as overlays and suggested text in draft reports, making orchestration and UX, not just accuracy, the real competitive moat.

Generalizability gaps: Most models have been trained and validated on academic-center datasets with narrow equipment and protocol distributions. Studies continue to show performance drop-off when models encounter community scanners, different acquisition parameters, or under-represented populations. There is emerging demand for continuous-learning and monitoring infrastructure—federated fine-tuning, drift detection, and site-specific calibration dashboards—that can make FMs robust in real-world environments.

Drug Discovery

Research Acceleration

Protein structure prediction: AlphaFold-style models replaced slow physics simulations with transformer networks that predict 3D structure directly from sequence, giving researchers a starting point for "where to bind." Companies like Isomorphic Labs are effectively productizing this interaction engine, now using these models not just to fold proteins but to predict how small molecules sit in pockets, scoring billions of candidates inside the learned structural landscape.

Generative molecular design: Earlier pipelines used classic ML (random forests, simple neural nets) to rank chemist-designed molecules; the search space still depended on human intuition. Modern systems use graph transformers and diffusion models to generate new molecules conditioned on desired properties or on a protein surface, then iteratively update the model based on assay results—an active-learning loop that shrinks the number of lab cycles needed to find a viable hit.

Synthetic data generation: Foundation models like ESM3 treat protein sequence, structure, and function as one joint vocabulary and are trained on billions of natural proteins. That training lets them act as "evolution simulators," proposing mutations and designs that are more likely to be viable before any experiment, effectively giving generative models a strong biological prior instead of starting from random noise.

Genomic mapping: On the data side, deep sequence models and transformers over genomic data replace hand-picked gene panels, learning latent representations that correlate with disease subtypes and treatment response. Platforms like Tempus and Recursion combine these embeddings with imaging and clinical outcomes to discover new patient subgroups and multi-cancer signatures, informing which programs to advance and how to stratify trials.

What's Still Unsolved

Clinical proof points do exist. Insilico Medicine's AI-designed molecule in Phase 2a is an important signal, but the field still needs multiple Phase 3 successes to prove that model-driven pipelines consistently beat traditional ones. Until then, most big-ticket deals look like potential for future performance. The next steps are to fully de-risk them.

Hallucinations are also not an LLM-specific issue. Generative models can produce molecules that look great in silico but fail for reasons the model never saw in training (instability, immune issues, subtle toxicity). Unlike text, there is no easy "fact-check," so teams are layering filters, property predictors, and explicit uncertainty estimates on top of generators to avoid chasing dead ends.

While academia continues to produce promising theoretical frameworks for AI in healthcare, translating these to industry is a true bottleneck. Many academic benchmarks still bake in data leakage or optimistic evaluation, so even high model accuracy cannot guarantee industry acceptance. Serious buyers now expect full pipeline audits—how data were split, how models behave on new chemotypes, and whether improvements hold up when plugged into live programs, not just on public datasets.

Key Players

  • Isomorphic Labs – DeepMind spin-out using AlphaFold-class models plus generative design to work on undruggable targets with Novartis and Lilly; multiple programs entering the clinic.
  • Xaira Therapeutics – $1B-backed platform that couples state-of-the-art molecular generators with a tightly integrated wet lab, aiming to industrialize the "design-make-test-learn" loop for proteins and small molecules.
  • Recursion – Public company running massive imaging and omics screens; uses large vision models on cell images and graph models on perturbation data to map how compounds change cell state at scale.
  • Insilico Medicine – End-to-end AI pipeline operator with one of the first AI-designed drugs in Phase 2a, making it a key bellwether for whether this category can generate approved products.
  • Generate Biomedicines – Flagship-backed generative biology firm using protein language models to design novel antibodies and enzymes, rather than tweaking natural backbones.
  • Evolutionary Scale (ESM3) – Builders of the ESM3 protein foundation model, positioned as shared infrastructure for any team that wants strong priors for protein design and synthetic biology.
  • Tempus – Precision-medicine network linking genomic and clinical data; uses deep models over multi-omics to power targeted therapy selection and trial matching at the point of care.

Clinical Trials

Clinical trials are the gridlock of biomedicine: they are slow to enroll, expensive to run, and frequently underpowered or inconclusive. AI is starting to matter here not because it "replaces" trials, but because it quietly rewrites the operational math around who gets enrolled, how protocols adapt, and how many patients need to sit in control arms.

Patient Recruitment

Today, finding eligible patients still looks like manual chart review against dense inclusion/exclusion criteria, which is slow and misses edge cases. Tempus, ConcertAI, and others now run large-scale NLP and LLM pipelines over structured and unstructured EHR data—pathology reports, radiology notes, clinic letters—to auto-screen candidates and route the best matches to sites.

The result is a trial-matching layer that:

  • Parses free-text criteria into machine-readable rules using LLMs, then applies them across millions of records in hours instead of weeks.
  • Filters out clearly ineligible patients early, so coordinators spend their time on the marginal cases that actually require judgment, cutting staff burden while lifting total eligible volume.

Trial Design Optimization

Before the first patient is dosed, sponsors need to decide where to open sites, how tight to set criteria, and what endpoints to prioritize; those decisions make or break timelines. AI platforms now simulate these choices upfront using historical real-world data and simple, well-calibrated prediction models.

Site-level models predict enrollment velocity and data quality based on prior performance, referral patterns, and local demographics. Investment partners can therefore nudge toward high-yield centers and away from habitual under-performers.

Models trained on past trial and real-world data can group patients into likely "good responders" and "poor responders," based on patterns in their histories. Statisticians then use these groups to test different inclusion criteria before the trial starts and to adjust the mix of patients mid-study—shifting enrollment toward groups that seem to benefit more as early results come in.

Synthetic Control Arms: "Digital Twins"

The most conceptually new piece is AI-generated "digital twins" that stand in for some control-arm patients. Instead of randomizing every participant to placebo or standard of care, sponsors train prognostic models on rich historical data to predict how each enrolled patient would have fared under control, then use those predictions in the analysis.

Unlearn, for example, builds disease-specific models that take a baseline patient profile and output an individualized control trajectory; these are locked and versioned before the trial starts, then used to augment or partly replace the control group in the final statistical analysis. Regulators have already accepted this approach in specific settings—EMA qualification for Unlearn's PROCOVA method and FDA alignment signal that digital-twin-assisted designs are moving from thought experiment to regulated practice.

Key Players

  • Tempus – Uses AI screening across integrated EHR-genomic networks to surface trial-eligible patients and enable "just-in-time" site activation, improving enrollment in biomarker-driven oncology studies.
  • Unlearn AI – Pioneers digital-twin control arms, with EMA-qualified methods and FDA alignment around using locked prognostic models to boost power or reduce control-arm size in Phase II/III trials.
  • ConcertAI – Runs Precision Trials, a SaaS platform that combines oncology real-world data with LLM-based screening and feasibility models to pick better sites, tune criteria, and automate screening workflows significantly faster than manual review.

Wearables and Remote Patient Monitoring (RPM)

The digital health market is expanding quickly as AI, wearables, and remote patient monitoring converge, with analysts estimating sustained double-digit annual growth as these categories blur into one stack.

At the same time, care is shifting from episodic, visit-based encounters to continuous 24/7 health surveillance, where always-on sensors and predictive models flag emerging issues early enough for outpatient intervention instead of hospital admissions.

Market Landscape

Consumer wearables (Apple, Fitbit, Oura, Garmin, Samsung, and new AI-enhanced devices) dominate mindshare and are collecting the largest labeled physiological datasets on earth.

Medical-grade RPM platforms such as Biofourmis, Dexcom, Validic, and Best Buy's Current Health acquisition are building the infrastructure to make those data clinically actionable—EHR pipes, care-team dashboards, and reimbursement pathways for hospital-at-home and chronic-care programs.

Specialized devices (AliveCor for smartphone ECG, Eko for digital auscultation, Empatica and others for seizure and respiratory monitoring) show that narrow, high-accuracy use cases can sustain real businesses once paired with strong ML.

Progress

Medical-grade RPM platforms

  • Biofourmis pairs its BiovitalsAI engine with connected sensors to run hospital-at-home and chronic-care programs, extending step-down monitoring into the home.
  • Validic is the data backbone, normalizing feeds from hundreds of devices into EHRs so health systems can run RPM at scale without custom integrations.
  • Dexcom anchors continuous glucose monitoring with AI-guided insights around glycemic patterns, time-in-range, and personalized alerts for people with diabetes.
  • Current Health (Best Buy) bundles FDA-cleared vitals devices and a home hub into turnkey monitoring kits that flag trends and prompt clinician intervention.

Specialized AI-enhanced devices

  • Withings BeamO combines ECG, pulse oximeter, digital stethoscope, and thermometer in one handheld device for consumers and telehealth.
  • AliveCor turns smartphones into ECG recorders and uses AI to detect arrhythmias, including some hyperkalemia signatures, beyond basic AFib screening.
  • Eko layers ML models onto digital stethoscopes to detect heart murmurs and AFib at the point of care.
  • Empatica builds seizure-detection wristbands that analyze motion and physiology in real time to alert caregivers.
  • Oxitone Medical offers wrist-based pulse oximetry, avoiding fingertip probes while enabling continuous SpO₂ tracking.
  • Strados Labs' RESP biosensor captures lung sounds and uses AI to quantify cough and wheeze for asthma and COPD monitoring.

Next Steps

Clinical integration: Providers need AI that turns continuous RPM streams into a few high-priority alerts instead of constant pings, or data will simply be ignored. Avoiding alert fatigue makes threshold tuning and role-specific routing core product features, and EHR-level integration is mandatory so insights appear in existing workflows.

Validation & accuracy: There is still a gap between consumer-grade signals and the accuracy clinicians need for medication or diagnosis decisions, keeping many devices in a "wellness" box. The upside is for teams willing to run real validation studies and seek clearance, turning pretty graphs into measurements payers and physicians can rely on.

Multi-modal integration: The stack is moving toward continuous multi-modal monitoring—wearables, implants, home cuffs, bed and room sensors—feeding one shared risk model. The defensible layer is the fusion model that converts many weak, noisy channels into stable, patient-specific risk scores.

Advanced capabilities: Predictive analytics aim to forecast events like COPD flare-ups or dialysis-related electrolyte issues days in advance, enabling pre-emptive outreach. Generative models can summarize weeks of sensor data into short, clinician-ready narratives, while on-device inference keeps critical alerts low-latency and privacy-preserving.

Ultimately, rigorous clinical validation—prospective trials and head-to-head comparisons—will determine which RPM stacks become part of standard care.

Insights from the Build Floor

Our January 2025 healthcare build day surfaced patterns that market analysis alone would miss.

Biomedical data discovery remains a high-friction problem with clear demand. Teams building dataset search agents encountered the same bottleneck repeatedly: translating a research hypothesis into concrete data requirements (assay type, tissue, modality) is itself an underspecified reasoning task. The most effective approaches used structured decomposition—parsing hypotheses into explicit constraints before querying repositories like GEO or CellxGene—rather than end-to-end retrieval. Metadata quality and cross-repository inconsistency emerged as the binding constraints, not model capability.

Drug discovery agents are technically impressive but struggle with validation loops. Teams exploring compound prediction built systems that could traverse from target to hit series, leveraging public databases and structure-activity reasoning. But the gap between "plausible candidate" and "worth synthesizing" remains wide—these systems need tighter integration with wet-lab feedback to be more than sophisticated literature search.

Interoperability is the unglamorous moat. Across tracks, teams that started with clean data pipelines and structured outputs outperformed those with more sophisticated models but messier integration. The projects that impressed judges most weren't necessarily the most novel—they were the ones that understood healthcare's data reality: HL7, CDA records, unstructured PDFs, and the peculiarities of EHR exports.

Acknowledgments

We're grateful to Mithrl for co-sponsoring the Healthcare AI Build Day and bringing deep domain expertise to the event design. Special thanks to our speakers—Vivek Adarsh (Co-founder & CEO @ Mithrl), Mac Klinkachorn (Co-founder @ Trellis), Josiah Meyer (Founder & CEO @ HealthLeap AI), and Sophia Lugo (CEO & Chairman @ Radar Therapeutics)—whose perspectives on clinical workflows, regulatory pathways, and enterprise deployment shaped both the research tracks and this analysis. Our judges—Amanda Black (Investment Director @ Farallon Capital Management), Belinda Mo, and Prashaant Ranganathan—evaluated projects with the rigor that healthcare AI demands, distinguishing genuine technical progress from benchmark theater.

We extend our thanks to our cohosts Mike Ng, Belinda Mo, Amanda Black, Mac Klinkachorn, and Moiead Charawi for helping design an event that pushed participants beyond demo-ready prototypes toward systems that could survive contact with real clinical environments. The winning teams—across compound discovery, biomedical data access, and administrative automation—demonstrated that the gap between frontier model capability and healthcare deployment is narrowing, but only for builders who understand the operational constraints that benchmarks don't capture.