bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Imaging

CryoLens

Chan Zuckerberg Initiative

A variational autoencoder for interpretable 3D reconstruction and representation learning of protein subtomograms from cryo-ET data, trained on 5.8 million synthetic particles.

Released: 2025

Overview

CryoLens is a generative deep learning model for learning compact, interpretable representations of molecular structures directly from cryo-electron tomography (cryo-ET) subtomograms. Developed at the Chan Zuckerberg Imaging Institute (CZII), it addresses a fundamental challenge in cryo-ET structural biology: how to extract meaningful structural information from individual particle subvolumes in real time — during or immediately after data collection — without requiring the computationally intensive subtomogram averaging and alignment pipeline that typically follows particle picking.

Conventional cryo-ET workflows proceed from particle picking to subtomogram extraction, iterative alignment, averaging, and eventually high-resolution structure determination. Each of these steps is computationally demanding and typically requires large numbers of particles. CryoLens takes a different approach: it trains a variational autoencoder (VAE) on a large corpus of synthetic subtomograms — particles simulated from known Protein Data Bank (PDB) structures with realistic cryo-ET artifacts including missing wedge effects and noise — and learns a low-dimensional latent space that encodes structural identity. The result is a model that can embed a new subtomogram into this latent space in a single forward pass, enabling rapid particle classification, structural clustering, and quality assessment without averaging.

CryoLens is available on the CZ Virtual Cells Platform (v0.0.1) and developed in collaboration with Kyle Harrington at CZII. It represents an application of Gaussian splatting — a rendering technique from computer graphics — to the problem of generating interpretable 3D density reconstructions from compact latent codes.

Key Features

  • Variational autoencoder for subtomograms: Encodes 48×48×48 voxel subvolumes into a 40-dimensional latent space using a 4-layer 3D convolutional encoder with stride-2 downsampling and reparameterization for variational inference, enabling both embedding extraction and generative sampling.
  • Segmented Gaussian Splat decoder: Decodes latent vectors into 3D density maps through 768 Gaussian splats, producing interpretable, differentiable 3D reconstructions that provide visual feedback on the structural content encoded in the latent representation.
  • Pose estimation head: A dedicated linear layer predicts a 4-channel axis-angle pose representation alongside the latent code, disentangling structural identity from orientation and enabling pose-invariant comparison of particles.
  • Large-scale synthetic pretraining: Trained on 5.8 million synthetic particles derived from 103 protein structures in the Protein Data Bank (PDB), with simulated missing wedge artifacts, realistic noise models, and uniform orientation sampling at 10 Å per voxel resolution.
  • Real-time structural feedback: Inference runs in a single forward pass on a GPU, enabling embedding extraction and approximate structural assessment during or immediately after cryo-ET data collection, without requiring alignment or averaging.
  • Contrastive affinity loss: Training combines missing wedge reconstruction loss, KL divergence (standard VAE objective), and a contrastive affinity loss that encourages structurally similar particles to occupy nearby regions of the latent space.

Technical Details

The CryoLens encoder is a 3D convolutional neural network with four downsampling layers using channels 8→16→32→64, stride-2 convolutions, and ReLU activations, producing a flattened feature vector that is projected to the 40-dimensional mean and log-variance vectors of the variational latent space through a reparameterization layer. A separate linear head processes the encoder output to predict the 4-parameter axis-angle pose vector. The decoder uses the 40-dimensional latent sample to parameterize 768 Gaussian splats — each defined by a center position, covariance, and amplitude — that are rendered onto a 48×48×48 voxel grid to produce the reconstructed density volume.

Training used the TomoTwin synthetic dataset, which contains simulated cryo-ET subtomograms of 103 proteins from the PDB. Each protein is represented at multiple orientations, with simulated contrast transfer function (CTF) effects, additive noise, and missing wedge masking applied to approximate real experimental conditions. The three training objectives are: (1) reconstruction loss between rendered Gaussian splat density and the input subtomogram, computed only over the non-missing-wedge region; (2) KL divergence penalizing deviation of the posterior from a unit Gaussian prior; and (3) a contrastive affinity term pulling embeddings of the same protein closer and pushing different proteins apart. Input volumes are pre-processed to 48×48×48 voxels at 10 Å per voxel, corresponding to a physical box size of approximately 48 nm — appropriate for medium-to-large protein complexes.

Applications

CryoLens is most directly useful in cryo-ET workflows that need fast structural feedback before committing to the full subtomogram averaging pipeline. During active data collection on a cryo-ET microscope, an operator can pass particle picks through CryoLens in real time to assess whether the session is yielding particles with consistent structural content, detect imaging artifacts, or identify contaminating particles. Downstream, the latent embeddings provide a basis for unsupervised clustering of the particle population — analogous to 2D classification in single-particle cryo-EM — to identify multiple conformations or compositional states before undertaking computationally expensive 3D alignment. The Gaussian splat reconstruction gives a human-interpretable visual representation of each cluster's structural content without requiring any averaging, helping users make informed decisions about which particle subsets to pursue for high-resolution reconstruction.

Impact

CryoLens demonstrates that Gaussian splatting — a technique originating in the neural rendering community — can produce interpretable, compact 3D representations useful for cryo-ET analysis, extending the reach of this rendering paradigm into structural biology. Its synthetic pretraining strategy, building on the TomoTwin dataset framework, provides a practical path to training with realistic volume distributions without exhaustive experimental annotation. As of early 2026, CryoLens is available as version 0.0.1 on the CZ Virtual Cells Platform and has not yet been published as a peer-reviewed paper; users should treat it as a research prototype. Systematic benchmarking of its particle classification accuracy and clustering quality against established subtomogram classification methods such as cryoDRGN and RELION 3D classification has not yet been published.

Tags

particle pickingstructure predictionvariational autoencoderCNNgenerativerepresentation learningself-supervisedcryo-ETstructural biology

Resources

GitHub RepositoryOfficial Website