bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Imaging

EchoJEPA

Bowang Lab

A Joint-Embedding Predictive foundation model for echocardiography, pretrained on 18M cardiac ultrasound videos to learn artifact-robust anatomical representations.

Released: February 2026
Parameters: 1.1 Billion

EchoJEPA is a self-supervised foundation model for echocardiography (cardiac ultrasound) developed by Alif Munim, Adibvafa Fallahpour, Teodora Szasz, Bo Wang, and colleagues at the University of Toronto, the Vector Institute, and the University Health Network. It was released as a preprint in February 2026 (arXiv:2602.02603). The model addresses a long-standing difficulty in ultrasound machine learning: echocardiograms are dominated by speckle, acoustic shadowing, and operator-dependent artifacts that confound pixel-reconstruction objectives and limit the transferability of supervised models across scanners and patient populations.

Rather than learning to reconstruct raw pixels, EchoJEPA adopts a Joint-Embedding Predictive Architecture (JEPA), in which the model predicts the latent representations of masked spatiotemporal regions from visible context. This latent predictive objective encourages the encoder to capture stable cardiac anatomy and motion while discarding stochastic ultrasound noise that has no predictable structure. EchoJEPA adapts the video JEPA paradigm (V-JEPA2) to the temporal characteristics of cardiac imaging, using higher frame sampling to resolve the rapid dynamics of the beating heart.

Trained on 18 million echocardiogram videos from roughly 300,000 patients — described as the largest pretraining corpus assembled for this modality — EchoJEPA produces general-purpose representations that transfer to clinical tasks including left ventricular ejection fraction (LVEF) estimation, right ventricular systolic pressure (RVSP) estimation, and echocardiographic view classification, with strong sample efficiency and cross-population generalization.

#Key Features

  • Latent predictive (JEPA) objective: By predicting masked regions in representation space rather than pixel space, EchoJEPA learns anatomical and motion features while ignoring speckle noise and acoustic artifacts that defeat reconstruction-based pretraining.
  • Cardiac-tuned video modeling: The model increases temporal sampling (24 fps acquisition, 16 frames per clip at 8 fps, tubelet size 2) relative to V-JEPA2's default settings to capture the fast dynamics of the cardiac cycle.
  • Extreme sample efficiency: EchoJEPA reaches 79% view-classification accuracy using only 1% of labeled data, compared with 42% for the best baseline trained on the full labeled set.
  • Robustness to acoustic degradation: Under physics-informed perturbations (depth attenuation and acoustic shadowing), the model degrades by roughly 2% versus about 17% for competing approaches, reflecting representations grounded in anatomy rather than image texture.
  • Cross-population generalization: Zero-shot transfer to pediatric echocardiograms surpasses fully fine-tuned baselines, indicating representations that generalize beyond the adult populations seen in pretraining.

#Technical Details

EchoJEPA is built on a Vision Transformer encoder trained with a joint-embedding predictive objective on spatiotemporal echocardiogram clips. The flagship EchoJEPA-G uses a ViT-Giant backbone with approximately 1.1 billion parameters, pretrained on 18.1 million proprietary echocardiogram videos from about 300,000 patients. A reproducible public variant, EchoJEPA-L, uses a ViT-Large encoder (about 307M parameters) pretrained on the 525,000-video MIMIC-IV-Echo dataset. Clips span roughly two seconds, sampled at 8 fps with patch size 16 and tubelet size 2.

Downstream evaluation spans internal cohorts (Toronto, about 150,000 studies; Chicago, about 60,000 studies) and public benchmarks (EchoNet-Dynamic, 10,030 videos; EchoNet-Pediatric, 3,316 videos). On LVEF estimation EchoJEPA-G reaches a mean absolute error of about 3.97, and on RVSP estimation about 4.54 mmHg MAE, improving over leading baselines by roughly 20% and 17% respectively. EchoJEPA-L achieves 85.5% view-classification accuracy. Robustness testing uses physics-informed perturbations — linear depth-attenuation ramps and Gaussian-weighted acoustic shadows of varying severity — under which the model's error rises by only about 2.3%.

#Applications

EchoJEPA serves as a pretrained backbone for cardiology and cardiac imaging research, where labeled echocardiography data is scarce and expensive to annotate. Its representations support automated estimation of functional measurements such as ejection fraction and right ventricular systolic pressure, automated view recognition for protocol triage and quality control, and rapid adaptation to new tasks with minimal labeled data. The strong zero-shot transfer to pediatric imaging makes it attractive for populations and centers where large labeled datasets do not exist, and its robustness to acoustic artifacts suits deployment across heterogeneous scanners and acquisition conditions.

#Impact

EchoJEPA demonstrates that latent predictive pretraining, rather than pixel reconstruction, is well matched to noise-dominated medical ultrasound, where much of the pixel signal is stochastic speckle. By assembling the largest reported echocardiography pretraining corpus and showing large gains in sample efficiency, robustness, and cross-population generalization, it provides a reusable foundation for cardiac ultrasound analysis and a template for applying JEPA-style objectives to other artifact-heavy imaging modalities. As a preprint with a publicly released ViT-Large variant trained on open MIMIC-IV-Echo data, its peer-reviewed clinical validation and the generalizability of the proprietary-scale results to external real-world deployment remain to be established.

Citation

EchoJEPA: A Latent Predictive Foundation Model for Echocardiography

Preprint

Munim, A., et al. (2026) EchoJEPA: A Latent Predictive Foundation Model for Echocardiography. arXiv.org.

DOI: 10.48550/arXiv.2602.02603

Citations

Total Citations3

GitHub

Stars308
Forks50

Openness

Unclassified
Missing required components

Tags

cardiac_ultrasoundechocardiographyejection_fraction_estimationfoundation_modeljeparepresentation_predictionself_supervisedview_classificationvision_transformer

Resources

GitHub RepositoryResearch PaperOfficial Website