Blog Post

Over the past two months, I have immersed myself in the biotech and AI ecosystems—attending Unlock 2026, the Biotechnology Forum at Stanford, and SynBioBeta, as well as the recent DeepMind Enterprise Panel at AGI House—while speaking with R&D, BD, and innovation leaders across pharma about where AI adoption actually stands.

My vantage point at AGI House is specific. We work at the intersection of frontier AI builders and enterprise organizations, with a focus on pharma currently. That means I am often in the translation layer myself: listening to what AI-native companies believe is technically possible, what pharma leaders are actually willing to adopt, and where the two sides still misunderstand each other.

From that vantage point, one thing has become clear: the pharma version of multimodal AI looks very different from the consumer AI version.

When most technologists hear "multimodal AI," they think of systems that combine images and text — a model that describes a picture, generates an image from a prompt, or reasons across documents and screenshots. In pharma, multimodality is something deeper. It is not just about combining different data types. It is about learning translation functions between biological scales: from morphology to molecular state, from perturbation to phenotype, from cellular trajectory to patient outcome.

Each biological scale is not merely a different data format. It is a different physical reality.

A pathology image, a transcriptomic profile, a protein expression map, a CRISPR perturbation readout, a live-cell trajectory, and a clinical outcome all capture biology at different levels of organization. The frontier is not simply fusing these modalities into larger models. The frontier is learning how one level of biology can predict, explain, or simulate another.

The companies attracting serious pharma interest are not the ones claiming to have "all the data." They are the ones that have identified specific biological translation problems and built the data engines to solve them.

1. The clearest near-term case: translating images into molecular state

The most concrete multimodal story I encountered across recent conferences is also one of the easiest for pharma executives to understand: inferring rich molecular profiles directly from standard pathology images.

Noetik, a South San Francisco–based company building foundation models from primary human tissue, launched TARIO-2, a model designed to predict whole-transcriptome spatial gene expression profiles from standard H&E histopathology images. H&E staining is standard of care globally — every tumor biopsy receives it, in every hospital, in every country. Spatial transcriptomics, by contrast, is expensive, technically demanding, and unavailable in most clinical settings.

That asymmetry is the business insight.

If a model can convert what every hospital already has into something almost no hospital can routinely access, then multimodal AI stops being a research tool and starts becoming infrastructure. The value is not "more data." The value is asymmetric compression: using an inexpensive, universal measurement to infer an expensive, scarce, high-value biological layer.

Noetik has shown early validation suggesting that such models can identify biological patterns among drug responders using standard H&E images from historical clinical data — implying better patient selection, better trial design, and new ways to extract value from existing clinical archives. Its non-exclusive licensing deal with GSK is a signal that pharma is beginning to treat cross-modal inference as a serious partnership asset.

Genentech's research group has developed a parallel capability through its SHIFT model, translating H&E images into spatial transcriptomics and extending that approach to predict how cells respond to genetic perturbations from morphology alone. A key validation insight: perturbations with similar RNA effects also show similar morphological effects, and vice versa. Morphology is not merely a picture of biology. Under the right modeling framework, it can become an inferential window into molecular state.

For pharma leaders, this changes the diligence question. The question is not simply whether a model performs well on a benchmark. The question is: what scarce biological layer can this system infer from a widely available input, and does that inference improve a real R&D decision?

2. The missing modality: time

Most biological datasets are static. A sample is taken, measured once, and analyzed as a snapshot. But biology is not static. Cells transition, adapt, differentiate, resist treatment, and respond to perturbation over time. That temporal dimension is becoming one of the most important missing modalities in biology AI.

Cellanome, which participated in our recent AI for Life Sciences Salon, has built a live-cell multimodal platform around this premise. Its CellCage technology tracks individual cells longitudinally, linking molecular, functional, and temporal data from the same cells over time. Instead of measuring where a cell is at one moment, the platform generates a record of how that cell changes, responds, and differentiates. Cellanome was represented by Shawn Levy, its SVP of Scientific Affairs, at our salon — his perspective on what longitudinal single-cell data can and cannot yet do for AI model training was one of the evening's sharpest contributions.

Cellular Intelligence applies a complementary approach at the stem cell level. Using barcoded, semi-permeable capsules, the company runs more than one million parallel cell differentiation trajectories simultaneously. Foundation models trained on this trajectory data have demonstrated surprising out-of-domain generalization, including models trained on healthy stem cell trajectories predicting cancer cell behavior.

Many drug failures are failures of dynamics. A molecule may bind the target. A biomarker may look promising at baseline. But disease progression, treatment resistance, immune adaptation, and toxicity unfold over time. Models that cannot reason over biological trajectories will struggle to predict the outcomes that actually matter for pharma decision-making.

3. From observing biology to perturbing it

The first generation of biological AI models largely learned to describe biology. The next generation is being trained to predict what happens when biology is changed.

This is a critical distinction for pharma. A model that describes a tumor is useful. A model that predicts which perturbation will change that tumor's behavior, which patients will respond, or which cellular context will produce toxicity is much closer to a discovery engine.

Xaira Therapeutics' X-Cell, launched in March 2026, is a 4.9 billion parameter diffusion language model trained on X-Atlas/Pisces — the largest genome-wide CRISPRi Perturb-seq dataset ever reported, containing 25.6 million perturbed single-cell transcriptomes across seven biologically diverse cellular contexts. The model is designed to predict how cells respond to genetic perturbations it has never seen, across cell types it was not trained on.

One notable architectural choice: diffusion rather than next-token prediction. Unlike words in a sentence, genes have no natural sequence order. Biology is not text. It has structure, constraints, feedback loops, and causal relationships that require domain-specific modeling choices — which is why biology foundation models cannot simply copy the architecture or training recipes of large language models.

Perturbation data should be treated as a modality in its own right. It captures cause and response — telling the model what happens when the system is pushed.

Xaira has also incorporated multiple sources of biological prior knowledge — including literature and molecular interaction networks — alongside experimental data. The most sophisticated models will not only combine images, omics, and clinical data. They will also integrate the accumulated scientific understanding of pathways, interactions, mechanisms, and constraints.

For pharma executives, this reframes platform evaluation. A model trained only to represent observed biological states may be useful for analysis. A model trained on interventional and causal data may be useful for decision-making. The distinction matters.

4. When quantity becomes quality

One of the most useful observations I heard repeatedly: in biology, experimental quantity can eventually become a new kind of quality. This sounds counterintuitive because pharma has historically prized careful, precise, low-throughput experimentation. That standard still matters. But the history of single-cell genomics showed that very large volumes of individually noisy measurements can reveal biological structure that small, precise experiments cannot. When the field moved from a few careful measurements to thousands of droplet-based measurements per second, algorithms began discovering real cell-type diversity across tissues and disease.

The lesson is not that data quality no longer matters. The lesson is that biological signals sometimes emerge from the distributional scale.

Noetik is applying this logic in oncology — generating aligned three-modality data (H&E, spatial proteomics, and spatial transcriptomics) from thousands of cancer patients per year across solid tumors. Each patient becomes another increment in understanding the relationship between morphology, protein expression, and gene activity in real human tissue. The longer they operate, the harder they become to replicate.

For enterprise pharma, this changes how partnerships should be assessed. The key question is not only "How accurate is the model today?" It is also "What data engine is improving this model over time?" A platform with a repeatable data-generation loop may be more strategically valuable than a model with impressive but static performance.

5. The business is starting to follow the science

The clearest signal that multimodal AI is maturing in pharma is that deal structures are changing around it.

AbbVie has created a BD function focused on AI across the R&D value chain, with diligence capabilities that look different from traditional pharma BD: model architecture review, training data assessment, computational talent evaluation, and questions about whether an AI platform maps to real R&D decisions. The asset is not always a molecule. Sometimes the asset is the translation system that helps generate or select better molecules.

Noetik's non-exclusive model license with GSK illustrates one emerging deal structure. If a model's value lies in its training data, generalization capability, and continuous improvement, non-exclusive licensing can be commercially rational — the model can serve multiple pharma partners without its value being destroyed. This is structurally different from a traditional exclusive asset-license mindset.

Manifold Bio's collaboration with Roche — $55 million upfront and potential milestones exceeding $2 billion — shows another pattern. The collaboration targets a specific biological translation problem: getting drugs across the blood-brain barrier. Manifold's mDesign platform connects protein design, molecular barcoding, in vivo biology, and sequencing readout into a single discovery loop. It is multimodal in a physical and experimental sense, not just a computational one.

The lesson is consistent: pharma does not buy platform architecture in the abstract. Pharma buys better decisions, better molecules, better patient selection, and better odds of clinical success.

6. What pharma leaders should look for

For pharma executives evaluating multimodal AI platforms, five practical questions help separate the real from the overstated.

‍What specific biological translation problem does it solve? "We combine imaging, omics, and clinical data" is not enough. The stronger claim is: "We can infer this expensive biological layer from this cheaper, widely available input" — or "We can predict this intervention response from this experimental system."‍
Does the company own or continuously generate scarce paired data? Aligned multimodal data — H&E paired with spatial transcriptomics, perturbation paired with single-cell response, live-cell imaging paired with molecular readout — is often more valuable than raw data volume. Ask what keeps the pairing proprietary and compounding.‍
Is the model describing biology or predicting biology under intervention? Descriptive models improve analysis. Interventional models can change discovery strategy. Pharma needs both, but they should not be confused with each other.‍
Does the model output map to a real R&D decision? The best AI platform is not the one with the most elegant architecture. It is the one that helps a team decide which target to pursue, which molecule to optimize, which patient to enroll, or which indication to prioritize.‍
What validation standard would make this model operationally trusted? For research acceleration, internal validation may suffice. For clinical decision support or regulatory submission, the bar is much higher. Separate AI as a discovery accelerator from AI as a clinical decision tool — they require different evaluation frameworks.

The field is moving quickly, but unevenly. Some capabilities are already useful. Some are promising but not yet operational. Some are technically impressive but disconnected from pharma workflows. The role of leadership is to tell the difference.

What this piece does not cover yet

This article focuses on multimodal AI as biological translation infrastructure. Three important topics deserve separate treatment.

→ Regulation. FDA's evolving framework for AI in drug development will shape how multimodal models are validated, submitted, and trusted. There is a significant difference between AI as a lab accelerator and AI as a clinical decision tool.

→ EHR multimodality. Combining structured clinical data with imaging, genomics, notes, and longitudinal outcomes inside hospital or pharma systems involves a different class of complexity: inconsistent coding, missing fields, population bias, and workflow fragmentation.

→ The 'binder is not a drug' problem. A computationally designed molecule that binds a target is not automatically a drug candidate. Lead optimization requires balancing many properties simultaneously. Multimodal AI may help here, but it remains one of the hardest unsolved areas in AI-driven drug discovery.

The bridge still needs to be built

The larger transition is clear. The frontier of AI in pharma is moving from representation to translation: from models that describe biological states to systems that infer, predict, and help intervene across biological scales. The companies moving fastest have identified specific translation problems and built the data infrastructure required to solve them.

This is also where my work at AGI House sits. I spend much of my time between two groups that do not yet speak the same language fluently: frontier AI builders who see what is technically becoming possible, and enterprise pharma leaders who must decide what is trustworthy, adoptable, and worth integrating into real R&D systems. The most valuable conversations are no longer generic discussions about "AI in drug discovery." They are specific conversations about translation problems: morphology to molecular state, perturbation to phenotype, trajectory to outcome, model output to R&D decision.

Our recent AI for Life Sciences Salon Dinner, hosted in partnership with Premji Invest, was designed around exactly that kind of conversation — bringing together researchers, AI builders, investors, and clinical scientists to examine what multimodal biological AI can and cannot yet do, and what trustworthy adoption in medicine actually requires. The writeup from that evening is available on our research blog and offers a view into where the field's deepest unsolved questions still sit.

If you are working through multimodal data strategy, AI platform evaluation, or the question of which frontier capabilities are ready for deployment, this is the conversation we want to help advance. AGI House's enterprise biopharma initiative brings AI builders and pharma organizations together around defined technical challenges. Our AI × BioPharma Summit in Summer 2026 is one step in that direction.

The bridge between what is technically possible and what is operationally deployed is still under construction. But in multimodal biology, it is becoming much clearer where that bridge needs to be built.

‍

Beyond Data Fusion: Multimodal AI Is Becoming Biology's Translation Layer