Channel-agnostic Vision Transformer trained on 3M+ Cell Painting images via masked autoencoder, producing 384-dimensional morphological embeddings for zero-shot phenotypic analysis.
OpenPhenom-S/16 is a publicly released foundation model for high-content microscopy developed by Recursion Pharmaceuticals. It applies a Channel-Agnostic Masked Autoencoder (CA-MAE) architecture to Cell Painting images, generating compact morphological embeddings that capture the phenotypic state of cells without requiring any labeled training data. The underlying research was presented as a spotlight paper at CVPR 2024 and the model weights were made publicly available in November 2024 via HuggingFace and Google Cloud Vertex AI Model Garden.
The model addresses a fundamental challenge in phenomics: microscopy datasets are acquired under varied experimental conditions with different fluorescence channel configurations, making it difficult to train a single model that generalizes across assays. Conventional vision models stack channels as fixed-depth tensors, requiring a consistent channel count at inference. OpenPhenom-S/16 overcomes this by processing each fluorescence channel independently through patch tokenization and then fusing information across channels via cross-attention, enabling inference on images with any number or ordering of channels.
OpenPhenom-S/16 is the publicly accessible member of Recursion's broader Phenom model family, which includes proprietary larger models (Phenom-1 and Phenom-2) trained on internal datasets of tens of millions of wells. The public release gives the academic community access to the architecture and weights trained on open datasets, along with pre-computed embeddings for the RxRx3-core benchmark.
OpenPhenom-S/16 is built on a Vision Transformer Small backbone with 16x16 pixel patch size (ViT-S/16), totaling approximately 22 million parameters. The key architectural innovation is channelwise cross-attention: rather than stacking fluorescence channels into a single multi-channel input tensor, the model processes each channel's patch tokens independently and then applies cross-attention across channels to build a contextualized representation. This design permits inference on images with arbitrary channel count and ordering. Input images are 256x256 pixels in uint8 format; each image produces a single 384-dimensional embedding.
The model was pretrained on over three million microscopy images from two publicly accessible Cell Painting datasets: RxRx3 (Recursion's public high-content screening dataset with six fluorescence channels) and JUMP-CP (the Joint Undertaking for Morphological Profiling - Cell Painting dataset from multiple laboratories under varied conditions). The differing channel configurations of these two datasets directly motivated the channel-agnostic design. Benchmarks reported at CVPR 2024 show that ViT-based masked autoencoders outperform weakly supervised classifiers by up to 11.5% relative improvement in recalling known biological relationships from the StringDB protein interaction database, with CA-MAEs generalizing effectively to held-out JUMP-CP conditions.
OpenPhenom-S/16 targets researchers working in high-content screening and phenomics who need general-purpose morphological representations. Key use cases include morphological profiling of compound or genetic perturbation screens to cluster agents by phenotypic similarity and identify mechanism-of-action groups; compound-gene interaction prediction using embedding cosine similarity for zero-shot target identification; and cross-assay transfer to images acquired with different microscopes, staining conditions, or channel configurations. The model is also well-suited to drug discovery workflows for phenotypic screening of small molecules, complementing genomic and proteomic data. The bundled RxRx3-core embeddings lower the barrier to entry for groups without GPU infrastructure.
OpenPhenom-S/16 is one of the first openly released foundation models specifically designed for Cell Painting microscopy, filling a gap between proprietary pharmaceutical-scale models and general-purpose computer vision models not adapted for fluorescence imaging. Its CVPR 2024 spotlight recognition indicates peer validation of the channel-agnostic masked autoencoder approach as a meaningful advance in biological image representation learning. The model demonstrates a scalable pretraining paradigm: performance improves predictably with both model size and dataset scale, as validated by Recursion's internal Phenom-1 (ViT-L/8, 3.5 billion image crops) and Phenom-2 models. Notable limitations include a non-commercial-only license restricting industrial use, exclusivity to Cell Painting fluorescence images with uncharacterized performance on brightfield or phase-contrast modalities, and reduced representational capacity relative to Recursion's proprietary larger models. Image preprocessing — illumination correction, channel normalization, and resizing to 256x256 — remains the user's responsibility.