Suiren-1.0

Molecular foundation models pretrained on density functional theory data, encoding 3D geometry and quantum behavior for ADMET and drug discovery.

Released: March 2026

Parameters: 1.8 Billion

Suiren-1.0 is a family of molecular foundation models for organic chemistry, developed by Golab (the SAIS Physics Lab at the Shanghai Academy of Artificial Intelligence for Science) and described in a technical report first posted to arXiv in March 2026. The models are grounded in quantum chemistry: rather than learning from 2D molecular graphs or string representations alone, Suiren is pretrained on large-scale density functional theory (DFT) data so that its representations encode physically meaningful information about energies, forces, and 3D molecular geometry.

The central problem Suiren addresses is the gap between microscopic 3D conformational geometry and the macroscopic, ensemble-averaged properties that matter for downstream tasks such as ADMET prediction in drug discovery. Many practical chemistry workflows operate on 2D graphs or SMILES strings, yet the properties of interest are governed by 3D structure and quantum-mechanical behavior. Suiren bridges this gap through three coordinated variants and a distillation framework that transfers 3D structural knowledge into models that accept 2D inputs.

The family comprises Suiren-Base (a 1.8-billion-parameter equivariant backbone pretrained on quantum-chemical conformers), Suiren-Dimer (continued pretraining on intermolecular-interaction data), and Suiren-ConfAvg (a lightweight distilled model that produces conformation-averaged embeddings from 2D graphs or SMILES). Together they target accurate, transferable molecular property prediction for both single molecules and interacting pairs.

Key Features

Quantum-chemistry-grounded pretraining: Suiren-Base is trained on the Qo2mol dataset of roughly 70 million DFT conformers (B3LYP/def2-SVP level), exposing the model to energies, forces, and trajectory information rather than sequence or graph data alone.
SE(3)-equivariant architecture: The backbone combines an EquiformerV2-style equivariant graph network with an Equivariant Spherical Transformer (EST) and a mixture-of-experts design, so predictions respect the rotational and translational symmetries of molecular systems.
Three coordinated variants: Suiren-Base handles single-molecule quantum properties, Suiren-Dimer is specialized for intermolecular interactions, and Suiren-ConfAvg distills 3D knowledge into a compact model usable from 2D graphs or SMILES.
Conformation Compression Distillation (CCD): A diffusion-based framework that compresses complex 3D structural representations into 2D conformation-averaged embeddings, letting downstream users benefit from 3D-aware features without supplying 3D coordinates.
Open weights on HuggingFace: All three checkpoints are released under a Modified MIT license, with a documented ModelLoader API in the official repository for loading weights and running inference.

Technical Details

Suiren-Base is a 1.8-billion-parameter SO(3)-equivariant graph neural network built on an EquiformerV2 backbone augmented with the Equivariant Spherical Transformer. It uses a mixture-of-experts design — reported as 20 layers each combining S2Activation and EST experts — and is pretrained with Equivariant Masked Position Prediction (EMPP), a self-supervised objective in which atoms are removed and their coordinates reconstructed conditioned on atom type and target energy. Pretraining draws on the Qo2mol dataset of approximately 70 million DFT conformers; Suiren-Dimer adds continued pretraining on roughly 13.5 million intermolecular-interaction samples. On the MoleHB benchmark, Suiren reports state-of-the-art mean absolute error on 41 of 43 properties, with gains exceeding 20% on more than 20 tasks. On the Therapeutics Data Commons (TDC) ADMET group, it reports top-ranked results on 8 of 18 metrics and second place on 4 more, achieved with a single fixed training configuration rather than per-task hyperparameter tuning.

Applications

Suiren is aimed at computational chemists and drug-discovery teams who need accurate molecular property predictions across the ADMET spectrum — absorption, distribution, metabolism, excretion, and toxicity — as well as researchers studying quantum-chemical properties and intermolecular interactions. Because Suiren-ConfAvg accepts 2D graphs or SMILES, it slots into standard cheminformatics pipelines while retaining 3D-aware structural knowledge, making it practical for virtual screening and lead optimization. Suiren-Dimer extends the family to interaction-dependent tasks such as binding and association behavior between molecular pairs.

Impact

Suiren contributes to a growing class of physics-grounded molecular foundation models that ground learning in DFT-level quantum data rather than relying solely on 2D structure or empirical labels. Its strong, hyperparameter-free results across TDC ADMET and MoleHB suggest that conformer-scale quantum pretraining yields broadly transferable representations for downstream chemistry. The open release of all three checkpoints under a Modified MIT license lowers the barrier for adoption in drug-discovery research. As a technical report, the work has not undergone peer review, and the full Qo2mol pretraining corpus is not yet completely open-sourced, which constrains exact reproduction of the pretraining stage.

Citation

Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

Preprint

An, J., et al. (2026) Suiren-1.0 Technical Report: A Family of Molecular Foundation Models.

DOI: 10.48550/arXiv.2603.21942

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations4

Influential0

References89

GitHub

Stars17

Forks1

Open Issues1

Contributors2

Last Push27d ago

LanguagePython

HuggingFace

Downloads0

Likes2

Last Modified3mo ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

46Partial

Usability — can I run it?87

Reproducibility — can I retrain it?8

open weights, closed recipe

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model HuggingFace Model HuggingFace Model

Key Features

Quantum-chemistry-grounded pretraining: Suiren-Base is trained on the Qo2mol dataset of roughly 70 million DFT conformers (B3LYP/def2-SVP level), exposing the model to energies, forces, and trajectory information rather than sequence or graph data alone.

SE(3)-equivariant architecture: The backbone combines an EquiformerV2-style equivariant graph network with an Equivariant Spherical Transformer (EST) and a mixture-of-experts design, so predictions respect the rotational and translational symmetries of molecular systems.

Three coordinated variants: Suiren-Base handles single-molecule quantum properties, Suiren-Dimer is specialized for intermolecular interactions, and Suiren-ConfAvg distills 3D knowledge into a compact model usable from 2D graphs or SMILES.

Conformation Compression Distillation (CCD): A diffusion-based framework that compresses complex 3D structural representations into 2D conformation-averaged embeddings, letting downstream users benefit from 3D-aware features without supplying 3D coordinates.

Open weights on HuggingFace: All three checkpoints are released under a Modified MIT license, with a documented ModelLoader API in the official repository for loading weights and running inference.

Technical Details

Applications

Impact

Suiren-1.0

Key Features

Technical Details

Applications

Impact

Citation

Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Suiren-1.0

Key Features

Technical Details

Applications

Impact

Citation

Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

Recent citations

Top citations

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Suiren-1.0

#Key Features

#Technical Details

#Applications

#Impact

Citation

Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Suiren-1.0

#Key Features

#Technical Details

#Applications

#Impact

Citation

Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

Recent citations

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact