bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Language model foundation models
Language modelDNA & Gene

Verily Multimodal EHR + Genomics Foundation Model

Verily Life Sciences

A GPT-2-style EHR foundation model that integrates polygenic risk scores via cross-attention, enabling zero-shot disease prediction from clinical and genetic data.

Released: October 2025
Parameters: 155 Million

Electronic health record (EHR) foundation models have demonstrated that self-supervised pretraining over longitudinal clinical sequences can learn transferable representations of patient health, much as language models learn from text. Most of these models, however, treat the EHR as the sole modality and ignore the genetic factors that shape disease risk. The Verily Multimodal EHR + Genomics Foundation Model, described by Amar and colleagues at Verily Life Sciences in an October 2025 preprint, addresses this gap by making genomics a first-class input alongside the clinical timeline.

The model's central contribution is the integration of polygenic risk scores (PRS) as a foundational data modality. Rather than appending genetic features as static covariates, the architecture fuses PRS into a GPT-2-style autoregressive EHR backbone through a cross-attention mechanism, allowing the model to condition its predictions about a patient's clinical trajectory on inherited risk. It is pretrained on participant data from the All of Us Research Program, a large and demographically diverse U.S. cohort, positioning the work as an early example of genomics-aware multimodal foundation models built on real-world, controlled-access biobank data.

Once pretrained, the model supports zero-shot disease prediction and can be adapted to downstream classification tasks through transfer learning without retraining the backbone. The authors emphasize Type 2 Diabetes prediction as a demonstration of how combining EHR context with genetic predisposition improves risk estimation over EHR-only baselines.

#Key Features

  • PRS as a foundational modality: Polygenic risk scores are integrated into the model via cross-attention rather than treated as auxiliary features, letting genetic predisposition directly inform clinical-sequence modeling.
  • GPT-2-style EHR backbone: A 155M-parameter autoregressive transformer models longitudinal EHR events, following the language-model paradigm adapted to structured clinical data.
  • Zero-shot disease prediction: After pretraining, the model can estimate disease risk for conditions such as Type 2 Diabetes without task-specific fine-tuning.
  • Transfer learning to downstream tasks: Learned representations can be reused for custom classification problems without retraining the full model, lowering the data and compute cost of new applications.
  • Built on a diverse biobank: Pretraining on the All of Us Research Program (~135,000 participants) grounds the model in a large, demographically varied real-world cohort.

#Technical Details

The architecture couples a GPT-2-style autoregressive transformer (approximately 155 million parameters) that encodes the longitudinal EHR with a cross-attention pathway that injects polygenic risk scores into the model's representations. This design lets genetic signal modulate predictions across the patient timeline instead of acting as a one-time input. Pretraining uses data from roughly 135,000 All of Us participants who have both linked EHR records and genomic data. The authors evaluate the model on disease prediction, highlighting Type 2 Diabetes, and on transfer-learning setups where pretrained representations are adapted to new classification targets; reported results indicate that adding the PRS modality improves predictive performance relative to EHR-only configurations. The work is a Verily project with an associated Verily–NVIDIA precision-health AI collaboration.

#Applications

The model is aimed at precision-health and clinical research settings where both phenotypic history and genetic risk are available. Potential use cases include risk stratification for common polygenic conditions, cohort enrichment for clinical studies, and as a pretrained backbone that downstream teams can adapt to specialized prediction tasks with limited labeled data. Because it learns from linked EHR and genomic records, it is particularly relevant to biobank-scale research programs and health systems exploring genomics-informed decision support.

#Impact

The model illustrates a broader trend toward multimodal foundation models that unify clinical and molecular data, and it is among the early efforts to treat polygenic risk as a native modality within an EHR foundation model rather than a bolt-on feature. Its reliance on the controlled-access All of Us Research Program is also a key limitation for reproducibility and reuse: neither the model weights nor the training code are publicly available, since the model is trained on restricted participant data. As a result, the work is best read as a methodological demonstration of genomics–EHR fusion whose independent validation and external adoption will depend on future releases or replication on accessible cohorts.

Citation

Integrating Genomics into Multimodal EHR Foundation Models

Preprint

Amar, J., et al. (2025) Integrating Genomics into Multimodal EHR Foundation Models. bioRxiv.

DOI: 10.1101/2025.10.26.684668

Recent citations

Papers that recently cited this model.

  • Cross-Modal Generative Augmentation for Multimodal Biological Classification

    Hyunwoo Yoo, Efstathia Soufleri, Deepak Ravikumar, et al.

    0

Top citations

The most-cited papers that cite this model.

  • Cross-Modal Generative Augmentation for Multimodal Biological Classification

    Hyunwoo Yoo, Efstathia Soufleri, Deepak Ravikumar, et al.

    0

Citations

Total Citations1
Influential0
References26

Fields of citing research

  • Biology100%
  • Computer Science100%

Share of papers citing this model.

Openness

bio.rodeo opennessClosed · low usability and reproducibility
8Closed
Usability — can I run it?7
Reproducibility — can I retrain it?7
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

cross_attentiondisease_risk_predictionelectronic_health_recordsfoundation_modelgenomicsmultimodaltransfer_learningtransformervariant_effect_predictionzero_shot

Resources

Research PaperResearch PaperOfficial Website