Labs & Groups (2)
Models (32)
A graph foundation model for fMRI brain networks, pretrained across 27 datasets with graph and language prompts for zero/few-shot generalization to unseen disorders.
A multimodal Q-former that fuses DNA sequence, gene context, protein function, and text into a prefix for a frozen LLM, enabling zero-shot genetic variant interpretation.
Proteo-R1
Stanford University / University of Tokyo / RIKEN Center for Advanced Intelligence Project / Chinese University of Hong Kong
Released May 1, 2026
A reasoning-guided foundation model for de novo antibody CDR design, pairing a multimodal-LLM understanding expert with a Boltz-1-based diffusion generation expert.
A 110M-parameter multimodal RNA language model that designs RNA sequences from secondary structure, consensus, and Gene Ontology constraints via discrete diffusion.
Generative pipeline for epitope-targeted de novo antibody (nanobody) CDR design that yields nanomolar binders from only dozens of designs per antigen.
A generative diffusion-transformer foundation model that embeds H&E histology, RNA profiles, and clinical text in a shared latent space for zero-shot cross-modal synthesis.
A hierarchical multimodal foundation model integrating spatial transcriptomics and H&E histology for biological discovery and platform-agnostic clinical prediction.
A graph-attention model producing context-aware protein embeddings from protein-protein interaction, co-expression, and tissue networks, with biologically motivated data splits.
A short-context masked DNA language model trained on curated regulatory sequences with a motif-discovery regularizer for zero-shot TF motif recovery and variant effect prediction.
A structure-aware graph-attention model that predicts A-to-I RNA editing across tissues and species from sequence and secondary structure, with released pretrained weights.
A lightweight graph-convolutional foundation model for spatial transcriptomics that learns spatially coherent, interpretable spot embeddings via masked central-spot prediction.
Single-cell foundation model using tabular attention over context cells to enable zero-shot representation and in-context prediction of arbitrary perturbations.
Multi-modal flow-matching model that co-designs the sequence, structure, and molecular surface of therapeutic peptides targeting protein-protein interactions.
A 3D vision-language foundation model for abdominal CT that pretrains on paired scans, radiology reports, and structured EHR codes for zero-shot interpretation.
GMAI-VL-R1
Shanghai AI Laboratory / Fuzhou University / Shanghai Innovation Institute / Fudan University / Monash University / University of Washington / Stanford University
Released April 2, 2025
A reinforcement-learning-enhanced general medical vision-language model that adds step-by-step reasoning for medical image diagnosis and visual question answering.
A vision-language foundation model for precision oncology that pretrains on 50M pathology images and 1B text tokens via unified masked modeling.
Self-supervised Vision Transformer models trained on proteome-wide fluorescence microscopy images from the Human Protein Atlas for subcellular protein localization.
CHIEF
Harvard Medical School / Brigham and Women's Hospital / Stanford University
Released September 4, 2024
A weakly supervised pathology foundation model pretrained on 60,530 whole-slide images across 19 anatomical sites for cancer detection, prognosis, and molecular prediction.
BiomedGPT
Lehigh University / University of Georgia / Stanford University / Massachusetts General Hospital / University of Pennsylvania / University of Central Florida / UC Santa Cruz / UTHealth Houston / Mayo Clinic / Samsung Research America
Released August 7, 2024
Open-source, lightweight generalist vision-language foundation model for diverse biomedical imaging and text tasks.
LLaVA-Tri
UC Santa Cruz / Huazhong University of Science and Technology / Harvard University / Stanford University
Released August 6, 2024
A medical multimodal large language model pretrained on the 25M-image MedTrinity-25M dataset, achieving state-of-the-art accuracy on biomedical visual question answering.
Semi-supervised cryo-ET segmentation framework that adapts DINOv2 vision transformers for 3D organelle annotation using sparse 2D slice labels.
A multi-modal contrastive foundation model for sleep analysis, learning joint representations across brain activity, ECG, and respiratory polysomnography signals.
FMCIB (Foundation Model for Cancer Imaging Biomarkers)
Harvard Medical School / Dana-Farber Cancer Institute / Brigham and Women's Hospital / Massachusetts General Hospital / Maastricht University / Aarhus University / Stanford University
Released March 15, 2024
A self-supervised 3D CT foundation model that extracts general-purpose tumor representations for cancer imaging biomarker discovery across diverse downstream tasks.
RNA foundation model trained on chemical-mapping data from millions of sequences, predicting reactivity, secondary structure, and degradation.
An instruction-tuned vision-language foundation model from Stanford for interpreting and summarizing chest X-rays across eight clinical task types.
Zero-shot foundation model for single-cell gene expression that generates species-agnostic cell embeddings using protein language model representations of gene products.
CLIP-based vision-language foundation model for pathology, fine-tuned on 208,414 image-text pairs. Enables zero-shot tissue classification and image retrieval.
Med-Flamingo
Stanford University / Harvard Medical School / Hospital Israelita Albert Einstein
Released July 27, 2023
A multimodal medical vision-language model that performs few-shot generative visual question answering over medical images and text.
Genomic foundation model using the Hyena operator to process DNA at single-nucleotide resolution with context lengths up to 1 million tokens, 500x longer than transformer-based predecessors.
Efficient Evolution of Human Antibodies from Protein Language Models
Stanford University
Released April 24, 2023
Zero-shot antibody affinity maturation using ESM pseudolikelihood scoring. Improves binding up to 160-fold with no antigen-specific training data.
A text-conditioned latent diffusion model that generates realistic synthetic chest X-rays from free-form radiology prompts, adapting Stable Diffusion to the medical imaging domain.
Self-supervised vision-language model for zero-shot detection of chest X-ray pathologies, trained on image-report pairs without explicit labels.