bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & Gene

JEPA-DNA

NVIDIA

A training framework that grounds genomic foundation models with a joint-embedding predictive objective, learning to predict functional representations of masked DNA rather than reconstructing tokens.

Released: February 2026

Most genomic foundation models are trained with token-level objectives—masked or autoregressive prediction of nucleotides—which force the model to recover exact sequence content. That signal can over-emphasize low-level token statistics rather than the functional meaning of a region. JEPA-DNA, from NVIDIA's digital biology group in a February 2026 arXiv preprint, brings the Joint-Embedding Predictive Architecture (JEPA) idea from representation learning into genomics: instead of reconstructing masked nucleotides, the model predicts the functional representations of masked genomic segments, shifting the learning signal from token recovery to semantic alignment.

Rather than being a single new architecture trained from scratch, JEPA-DNA is a training framework that combines this joint-embedding predictive objective with conventional generative objectives, and can be applied as continual training to existing backbones. The authors report that this produces fixed genomic foundation-model checkpoints with improved performance across a broad evaluation suite, and they release the training and benchmarking code under Apache 2.0.

#Key Features

  • Joint-embedding predictive objective: Predicts learned representations of masked genomic segments instead of raw nucleotides, emphasizing functional semantics over token recovery.
  • Hybrid training signal: Combines the JEPA objective with traditional generative objectives within a single training framework.
  • Applicable to existing backbones: Implemented as continual training, with preconfigured setups for backbones such as DNABERT-2, NTv3, and HyenaDNA.
  • Broad benchmark gains: Reports improvements across 17 genomic benchmarks, establishing state-of-the-art results among genomic foundation models.
  • Open code (Apache 2.0): Pretraining and benchmarking code is publicly released.

#Technical Details

JEPA-DNA augments genomic pretraining with a joint-embedding predictive objective: a context encoder and a target encoder produce representations, and a predictor learns to map masked context to the target encoder's functional embeddings, complementing generative losses. The released repository provides run_jepa_pretrain.py for pretraining and integrates with a separate GFMBench-API for evaluation, with preconfigured parameter files for DNABERT-2, NTv3, and HyenaDNA backbones. Training produces context-encoder, target-encoder, and predictor checkpoints. Across 17 genomic benchmarks the framework is reported to set state-of-the-art results for genomic foundation models. The repository ships code under Apache 2.0 but does not release pretrained checkpoints, and training-data sources are configurable rather than fixed; specific corpora and quantitative results should be confirmed against the paper.

#Applications

JEPA-DNA is primarily a recipe for improving genomic foundation models, so its main beneficiaries are groups that train or fine-tune DNA models and want stronger, more functionally-grounded representations for downstream tasks such as regulatory element classification, variant effect prediction, and other GFMBench-style benchmarks. Because it operates as continual training over existing backbones, teams can upgrade models they already use rather than retraining from scratch.

#Impact

JEPA-DNA imports a representation-learning paradigm that has reshaped vision and speech into genomics, arguing that predicting functional embeddings is a better objective than reconstructing tokens for DNA. Its broad reported benchmark gains and open Apache-2.0 code make the approach easy to evaluate, though the absence of released checkpoints means practitioners must run the training themselves. As a February 2026 preprint, its conclusions await peer review and independent replication.

Tags

representation_learningvariant_effect_predictiontransformerjoint_embedding_predictive_architecturefoundation_modelself_supervisedrepresentation_learninggenomicsdna