DIA-CLIP

AI for Science Institute / Peking University

Contrastive dual-encoder model for DIA proteomics, embedding peptides and spectra in a shared space for zero-shot peptide-spectrum matching.

Released: April 2026

Data-independent acquisition mass spectrometry (DIA-MS) has become a cornerstone of large-scale proteomic profiling, prized for the depth and reproducibility of its measurements. A central computational step is the peptide-spectrum match (PSM): deciding which peptide produced a given fragmentation spectrum. Conventional DIA analysis tools re-score candidate PSMs with a semi-supervised classifier trained separately within each experimental run, an approach that is prone to overfitting and generalizes poorly across species, instruments, and acquisition conditions.

DIA-CLIP, introduced in early 2026 by researchers at the AI for Science Institute (Beijing) and Peking University's Center for Machine Learning Research, reframes PSM inference as a cross-modal representation-learning problem rather than a per-run classification task. Borrowing the contrastive image-text pretraining idea behind CLIP, it trains two encoders—one for peptide sequences and one for the corresponding DIA spectral features—to embed both modalities into a shared space where a peptide and its true spectrum lie close together. Once pretrained, the model scores matches in a zero-shot manner, without any run-specific fine-tuning.

This is one of the first attempts to bring large-scale, transferable foundation-model pretraining to the DIA scoring problem, positioning it alongside the broader wave of contrastive and language-model approaches now reshaping computational mass spectrometry.

Key Features

Zero-shot PSM inference: After a single pretraining stage, DIA-CLIP scores peptide-spectrum matches directly on new data without the per-run semi-supervised retraining that conventional pipelines require, improving robustness and generalization.
Dual-encoder contrastive framework: Separate peptide and spectrum encoders are aligned with a CLIP-style contrastive objective, producing a unified cross-modal embedding in which matching peptides and spectra are mutually nearest neighbors.
Encoder-decoder design: The contrastive backbone is paired with an encoder-decoder component that strengthens the learned representation for high-precision matching.
Higher identification depth at controlled error: The approach increases the number of confidently identified peptides and proteins while tightening control over false matches, expanding usable proteome coverage.

Technical Details

DIA-CLIP integrates a dual-encoder contrastive learning framework with an encoder-decoder architecture to build a unified representation of peptides and their DIA spectral features. The peptide encoder ingests amino-acid sequences while the spectrum encoder processes the associated fragment-ion features, and a contrastive objective aligns the two modalities so that correct peptide-spectrum pairs are scored highest. Pretraining draws on large-scale DIA spectral data spanning multiple datasets, enabling the model to transfer across experimental conditions. In benchmark evaluations against state-of-the-art DIA tools, DIA-CLIP delivered up to a 45% increase in protein identifications while simultaneously achieving a 12% reduction in entrapment (false) identifications, indicating that the added depth does not come at the cost of error control. The work is described in a 21-page preprint (5 figures) posted to bioRxiv and arXiv in 2026.

Applications

DIA-CLIP is aimed at proteomics researchers who analyze DIA-MS experiments and need deeper, more reliable peptide and protein identifications without tuning a scoring model for every run. Its zero-shot generalization is especially valuable in low-input and heterogeneous settings such as single-cell and spatial proteomics, where limited material and variable spectra strain conventional re-scoring pipelines. By recovering more identifications at a controlled error rate, the model can help surface novel biomarkers and resolve cellular mechanisms that shallower analyses miss.

Impact

By shifting DIA scoring from per-run semi-supervised training toward transferable cross-modal pretraining, DIA-CLIP demonstrates that foundation-model and contrastive-learning techniques can meaningfully advance computational mass spectrometry—a domain historically dominated by hand-engineered, run-specific classifiers. The reported gains, up to 45% more protein identifications with fewer false matches, are substantial for a mature field where incremental improvements are typical. As of mid-2026 the model is described in a preprint and has not yet been peer-reviewed; no public code or pretrained weights have been released, which currently limits independent reproduction and adoption. If openly released, the cross-modal representation-learning paradigm it introduces could influence the next generation of DIA analysis tools.

Citation

DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics

Liao, Y., et al. (2026) DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics. bioRxiv.

DOI: 10.64898/2026.02.09.704949

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References58

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

11Closed

Usability — can I run it?7

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Zero-shot PSM inference: After a single pretraining stage, DIA-CLIP scores peptide-spectrum matches directly on new data without the per-run semi-supervised retraining that conventional pipelines require, improving robustness and generalization.

Dual-encoder contrastive framework: Separate peptide and spectrum encoders are aligned with a CLIP-style contrastive objective, producing a unified cross-modal embedding in which matching peptides and spectra are mutually nearest neighbors.

Encoder-decoder design: The contrastive backbone is paired with an encoder-decoder component that strengthens the learned representation for high-precision matching.

Higher identification depth at controlled error: The approach increases the number of confidently identified peptides and proteins while tightening control over false matches, expanding usable proteome coverage.

Technical Details

Applications

Impact

DIA-CLIP

Key Features

Technical Details

Applications

Impact

Citation

DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

DIA-CLIP

Key Features

Technical Details

Applications

Impact

Citation

DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

DIA-CLIP

#Key Features

#Technical Details

#Applications

#Impact

Citation

DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

DIA-CLIP

#Key Features

#Technical Details

#Applications

#Impact

Citation

DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact