Multi-sequence spine-MRI foundation model with paired DINOv3 encoders, supporting 17-condition classification, pathology localization, image-report retrieval, and report generation.
Spine MRI is central to diagnosing back pain, spinal stenosis, trauma, and tumors, but its interpretation is slow and complex, requiring radiologists to synthesize findings across multiple imaging sequences (T1-, T2-weighted, STIR, Dixon) and anatomical levels. SpineAgent is a multi-sequence spine-MRI foundation model and accompanying multi-agent system that learns transferable visual representations from routine clinical imaging and applies them across the full spectrum of interpretation tasks, from condition classification to draft report generation.
Developed by Zhiping Xiao, Nathan M. Cross, Sheng Wang, and colleagues at the University of Washington (with collaborators at Peking University, UW–Madison, and NYU) and posted to bioRxiv in June 2026, SpineAgent is built on a self-supervised foundation model pretrained on one of the largest spine-MRI corpora reported to date: 32,047 patients, 453,683 series, and 13.4 million slices from University of Washington Medicine. Its core is a pair of DINOv3-based Vision Transformer encoders trained separately on T1- and T2-weighted data, which produce fixed patient-level embeddings that are reused across many downstream agents.
By decomposing radiology reporting into clinically grounded subtasks, each handled by a specialized agent that draws on the shared encoders, SpineAgent demonstrates that a single imaging foundation model can generalize across manufacturers and external cohorts, a recurring challenge for medical-imaging deep learning.
SpineAgent pretrains its two ViT encoders with DINOv3 on roughly 4.5 million T1 and 4.5 million T2 slices each, then aligns image and text representations through a CLIP-style stage using a BiomedBERT language encoder. For inference, slice-level embeddings from the sequence-specific encoders (or the synthesizer for other sequences) are concatenated and aggregated by an attention-pooling projector into a fixed set of patient-level image tokens. Across the 17 classification tasks, SpineAgent improves AUROC by 10.8% over the strongest baselines (with a 13.4% AUPRC gain) when using all available sequences, and continual training of the synthesizer yields an 11.1% AUROC improvement on non-T1/T2 sequences. On retrieval, it achieves a 56.4% relative improvement in Recall@5 on the UW Medicine dataset over the next-best method. Cross-manufacturer evaluation (training on one scanner vendor, testing across all) and cross-cohort evaluation on the external RSNA LumbarDISC cohort both show consistent gains, evidence of robustness to scanner and population shift.
SpineAgent is aimed at radiology workflows for spine imaging: it can triage and classify spinal pathology, highlight the slices and regions most relevant to a suspected condition, retrieve similar prior cases or matching reports, and generate a structured draft report to accelerate read times. Its patient-level embeddings also serve as a reusable representation for researchers building downstream spine-MRI models without retraining an encoder from scratch. Because the encoders generalize across manufacturers and to an external cohort, the system is relevant to multi-site clinical and research settings where imaging hardware and protocols vary.
SpineAgent shows that self-supervised foundation modeling on large routine clinical corpora can unify spine-MRI interpretation tasks that were previously addressed by separate, narrowly trained models, while generalizing across the manufacturer and cohort shifts that often degrade medical imaging models. The training pipeline (DINOv3 encoders, CLIP alignment, synthesizer routing, and the report-generation stack) is released under Apache-2.0 on GitHub. However, the model is a June 2026 bioRxiv preprint and has not yet been peer reviewed; the underlying clinical imaging data cannot be shared for privacy reasons, and pretrained weights are not yet publicly available, which currently limits external reproducibility and direct clinical deployment.
Xiao, Z., et al. (2026) A multi-agent system for spine MRI report generation from multi-sequence imaging. bioRxiv.
DOI: 10.64898/2026.06.07.730703Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data