EchoJEPA

Joint-embedding predictive foundation model for echocardiography, pretrained on 18M cardiac ultrasound videos for artifact-robust representations.

Released: February 2026

Parameters: 1.1 Billion

EchoJEPA is a self-supervised foundation model for echocardiography (cardiac ultrasound) developed by Alif Munim, Adibvafa Fallahpour, Teodora Szasz, Bo Wang, and colleagues at the University of Toronto, the Vector Institute, and the University Health Network. It was released as a preprint in February 2026 (arXiv:2602.02603). The model addresses a long-standing difficulty in ultrasound machine learning: echocardiograms are dominated by speckle, acoustic shadowing, and operator-dependent artifacts that confound pixel-reconstruction objectives and limit the transferability of supervised models across scanners and patient populations.

Rather than learning to reconstruct raw pixels, EchoJEPA adopts a Joint-Embedding Predictive Architecture (JEPA), in which the model predicts the latent representations of masked spatiotemporal regions from visible context. This latent predictive objective encourages the encoder to capture stable cardiac anatomy and motion while discarding stochastic ultrasound noise that has no predictable structure. EchoJEPA adapts the video JEPA paradigm (V-JEPA2) to the temporal characteristics of cardiac imaging, using higher frame sampling to resolve the rapid dynamics of the beating heart.

Trained on 18 million echocardiogram videos from roughly 300,000 patients — described as the largest pretraining corpus assembled for this modality — EchoJEPA produces general-purpose representations that transfer to clinical tasks including left ventricular ejection fraction (LVEF) estimation, right ventricular systolic pressure (RVSP) estimation, and echocardiographic view classification, with strong sample efficiency and cross-population generalization.

Key Features

Latent predictive (JEPA) objective: By predicting masked regions in representation space rather than pixel space, EchoJEPA learns anatomical and motion features while ignoring speckle noise and acoustic artifacts that defeat reconstruction-based pretraining.
Cardiac-tuned video modeling: The model increases temporal sampling (24 fps acquisition, 16 frames per clip at 8 fps, tubelet size 2) relative to V-JEPA2's default settings to capture the fast dynamics of the cardiac cycle.
Extreme sample efficiency: EchoJEPA reaches 79% view-classification accuracy using only 1% of labeled data, compared with 42% for the best baseline trained on the full labeled set.
Robustness to acoustic degradation: Under physics-informed perturbations (depth attenuation and acoustic shadowing), the model degrades by roughly 2% versus about 17% for competing approaches, reflecting representations grounded in anatomy rather than image texture.
Cross-population generalization: Zero-shot transfer to pediatric echocardiograms surpasses fully fine-tuned baselines, indicating representations that generalize beyond the adult populations seen in pretraining.

Technical Details

EchoJEPA is built on a Vision Transformer encoder trained with a joint-embedding predictive objective on spatiotemporal echocardiogram clips. The flagship EchoJEPA-G uses a ViT-Giant backbone with approximately 1.1 billion parameters, pretrained on 18.1 million proprietary echocardiogram videos from about 300,000 patients. A reproducible public variant, EchoJEPA-L, uses a ViT-Large encoder (about 307M parameters) pretrained on the 525,000-video MIMIC-IV-Echo dataset. Clips span roughly two seconds, sampled at 8 fps with patch size 16 and tubelet size 2.

Downstream evaluation spans internal cohorts (Toronto, about 150,000 studies; Chicago, about 60,000 studies) and public benchmarks (EchoNet-Dynamic, 10,030 videos; EchoNet-Pediatric, 3,316 videos). On LVEF estimation EchoJEPA-G reaches a mean absolute error of about 3.97, and on RVSP estimation about 4.54 mmHg MAE, improving over leading baselines by roughly 20% and 17% respectively. EchoJEPA-L achieves 85.5% view-classification accuracy. Robustness testing uses physics-informed perturbations — linear depth-attenuation ramps and Gaussian-weighted acoustic shadows of varying severity — under which the model's error rises by only about 2.3%.

Applications

EchoJEPA serves as a pretrained backbone for cardiology and cardiac imaging research, where labeled echocardiography data is scarce and expensive to annotate. Its representations support automated estimation of functional measurements such as ejection fraction and right ventricular systolic pressure, automated view recognition for protocol triage and quality control, and rapid adaptation to new tasks with minimal labeled data. The strong zero-shot transfer to pediatric imaging makes it attractive for populations and centers where large labeled datasets do not exist, and its robustness to acoustic artifacts suits deployment across heterogeneous scanners and acquisition conditions.

Impact

EchoJEPA demonstrates that latent predictive pretraining, rather than pixel reconstruction, is well matched to noise-dominated medical ultrasound, where much of the pixel signal is stochastic speckle. By assembling the largest reported echocardiography pretraining corpus and showing large gains in sample efficiency, robustness, and cross-population generalization, it provides a reusable foundation for cardiac ultrasound analysis and a template for applying JEPA-style objectives to other artifact-heavy imaging modalities. As a preprint with a publicly released ViT-Large variant trained on open MIMIC-IV-Echo data, its peer-reviewed clinical validation and the generalizability of the proprietary-scale results to external real-world deployment remain to be established.

Citation

EchoJEPA: A Latent Predictive Foundation Model for Echocardiography

Preprint

Munim, A., et al. (2026) EchoJEPA: A Latent Predictive Foundation Model for Echocardiography. arXiv.org.

DOI: 10.48550/arXiv.2602.02603

Recent citations

Papers that recently cited this model.

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies
Ke Liu, Mengxuan Li, Yanyi Bao, et al.
Jun 2026
0
EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function
E. Stenhede, J. Sulkowska, E. B. Orstad, et al.
May 2026
0
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity
Blaise Delaney, Salil Patel, Yujian Xing, et al.
Apr 2026
0Influential

Top citations

The most-cited papers that cite this model.

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels
Lucas Maes, Quentin Le Lidec, Damien Scieur, et al.
Mar 2026
50
EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function
E. Stenhede, J. Sulkowska, E. B. Orstad, et al.
May 2026
0
Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity
Blaise Delaney, Salil Patel, Yujian Xing, et al.
Apr 2026
0Influential
Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies
Ke Liu, Mengxuan Li, Yanyi Bao, et al.
Jun 2026
0

Citations

Total Citations4

Influential1

References40

GitHub

Stars328

Forks57

Open Issues9

Contributors2

Last Push1mo ago

LanguagePython

LicenseApache-2.0

Fields of citing research

Computer Science100%
Medicine75%
Engineering50%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

62Partial

Usability — can I run it?67

Reproducibility — can I retrain it?57

Model Openness Framework

Unclassified

Missing required components

Resources

GitHub Repository Research Paper Official Website

Key Features

Latent predictive (JEPA) objective: By predicting masked regions in representation space rather than pixel space, EchoJEPA learns anatomical and motion features while ignoring speckle noise and acoustic artifacts that defeat reconstruction-based pretraining.

Cardiac-tuned video modeling: The model increases temporal sampling (24 fps acquisition, 16 frames per clip at 8 fps, tubelet size 2) relative to V-JEPA2's default settings to capture the fast dynamics of the cardiac cycle.

Extreme sample efficiency: EchoJEPA reaches 79% view-classification accuracy using only 1% of labeled data, compared with 42% for the best baseline trained on the full labeled set.

Robustness to acoustic degradation: Under physics-informed perturbations (depth attenuation and acoustic shadowing), the model degrades by roughly 2% versus about 17% for competing approaches, reflecting representations grounded in anatomy rather than image texture.

Cross-population generalization: Zero-shot transfer to pediatric echocardiograms surpasses fully fine-tuned baselines, indicating representations that generalize beyond the adult populations seen in pretraining.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

Ke Liu, Mengxuan Li, Yanyi Bao, et al.

Jun 2026

EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function

E. Stenhede, J. Sulkowska, E. B. Orstad, et al.

May 2026

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Blaise Delaney, Salil Patel, Yujian Xing, et al.

Apr 2026

0Influential

Top citations

The most-cited papers that cite this model.

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

Lucas Maes, Quentin Le Lidec, Damien Scieur, et al.

Mar 2026

EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function

E. Stenhede, J. Sulkowska, E. B. Orstad, et al.

May 2026

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Blaise Delaney, Salil Patel, Yujian Xing, et al.

Apr 2026

0Influential

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

Ke Liu, Mengxuan Li, Yanyi Bao, et al.

Jun 2026

EchoJEPA

#Key Features

#Technical Details

#Applications

#Impact

Citation

EchoJEPA: A Latent Predictive Foundation Model for Echocardiography

Recent citations

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Top citations

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

EchoJEPA

#Key Features

#Technical Details

#Applications

#Impact

Citation

EchoJEPA: A Latent Predictive Foundation Model for Echocardiography

Recent citations

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Top citations

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

EchoXFlow: A Beamspace Echocardiography Dataset for Cardiac Motion, Flow, and Function

Sonata: A Hybrid World Model for Inertial Kinematics under Clinical Data Scarcity

Medical world models: representing medical states, modelling clinical dynamics and guiding intervention policies

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact