SELFormerMM

Multimodal molecular foundation model fusing SELFIES, 2D graphs, text, and knowledge graphs via contrastive pretraining for property prediction.

Released: March 2026

Molecular representation learning underpins much of modern computational drug discovery, where the goal is to predict properties such as toxicity, solubility, or bioactivity from a molecule's structure. Most existing models, however, rely on a single view of a molecule, typically a sequence notation like SMILES or a 2D structural graph, and therefore miss complementary information available in other modalities such as natural-language descriptions or curated biological knowledge graphs.

SELFormerMM, developed by the HUBioDataLab at Hacettepe University and posted in March 2026, is a multimodal molecular foundation model that integrates four modalities: SELFIES sequence notations, 2D structural graphs, textual descriptions, and knowledge-graph-derived biological interaction data. It extends the earlier SELFormer chemical language model by aligning these modality-specific representations through contrastive pretraining on roughly three million molecules, producing embeddings that downstream models can use for property prediction.

By fusing chemical structure with biological context drawn from a knowledge graph, SELFormerMM aims to capture aspects of molecules that pure structure-based models overlook, and the authors report improved performance over single-modality alternatives on molecular property tasks.

Key Features

Four-modality integration: Combines SELFIES sequences, 2D structural graphs, text descriptions, and knowledge-graph embeddings in a shared representation space.
Contrastive alignment: A supervised contrastive objective aligns the modality-specific branches so that complementary signals reinforce one another.
Built on SELFormer: The sequence branch reuses the RoBERTa-based SELFormer chemical language model, extending an established backbone rather than training from scratch.
Biological knowledge grounding: Knowledge-graph embeddings from a biological interaction graph inject context beyond chemical structure, such as known target relationships.
Open code and checkpoints: Code is released under GPL-3.0 with a pretrained checkpoint and datasets distributed through HuggingFace.

Technical Details

SELFormerMM uses four modality-specific branches projected into a shared 768-dimensional space. The sequence branch is SELFormer (a RoBERTa-based model over SELFIES); the text branch uses frozen SciBERT embeddings; the structure branch uses frozen Uni-Mol features from 3D conformers; and the knowledge-graph branch uses DMGI embeddings derived from the CROssBARv2 biological knowledge graph. Non-linear MLP projection heads align the modalities via a supervised contrastive (SINCERE) loss during pretraining on approximately three million molecules. For downstream prediction, concatenated multimodal embeddings feed task heads covering binary classification, multilabel classification, and regression. The authors report gains over single-modality baselines on molecular property benchmarks. Code is licensed under GPL-3.0, and the preprint is released under CC-BY.

Applications

SELFormerMM targets molecular property prediction tasks central to early-stage drug discovery, including ADMET-style endpoints, toxicity, and bioactivity classification and regression. Cheminformatics and machine-learning researchers can fine-tune the pretrained model or use its multimodal embeddings as features for their own predictors. The knowledge-graph component makes it particularly relevant when biological context, such as known interactions, is expected to inform a molecule's behavior beyond its chemical structure alone.

Impact

SELFormerMM contributes to a growing line of multimodal molecular models that move past single-view representations by fusing structure with text and curated biological knowledge. Its open code and released checkpoints make it straightforward for the cheminformatics community to reproduce and extend. As a recent preprint, its reported advantages over single-modality baselines await independent benchmarking, and its reliance on several frozen external encoders means performance is partly bounded by those upstream components.

Citation

SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration

Ulusoy, E., et al. (2026) SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration. bioRxiv.

DOI: 10.64898/2026.03.17.712340

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations58

Influential3

References50

GitHub

Stars3

Forks0

Open Issues0

Contributors2

Last Push1mo ago

LanguagePython

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

55Partial

Usability — can I run it?63

Reproducibility — can I retrain it?45

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper Dataset

Key Features

Four-modality integration: Combines SELFIES sequences, 2D structural graphs, text descriptions, and knowledge-graph embeddings in a shared representation space.

Contrastive alignment: A supervised contrastive objective aligns the modality-specific branches so that complementary signals reinforce one another.

Built on SELFormer: The sequence branch reuses the RoBERTa-based SELFormer chemical language model, extending an established backbone rather than training from scratch.

Biological knowledge grounding: Knowledge-graph embeddings from a biological interaction graph inject context beyond chemical structure, such as known target relationships.

Open code and checkpoints: Code is released under GPL-3.0 with a pretrained checkpoint and datasets distributed through HuggingFace.

Technical Details

Applications

Impact

Citation

SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration

Ulusoy, E., et al. (2026) SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration. bioRxiv.

DOI: 10.64898/2026.03.17.712340

SELFormerMM

Key Features

Technical Details

Applications

Impact

Citation

SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

SELFormerMM

Key Features

Technical Details

Applications

Impact

Citation

SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration

Recent citations

Top citations

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

SELFormerMM

#Key Features

#Technical Details

#Applications

#Impact

Citation

SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

SELFormerMM

#Key Features

#Technical Details

#Applications

#Impact

Citation

SELFormerMM: multimodal molecular representation learning via SELFIES, structure, text, and knowledge graph integration

Recent citations

Top citations

Related models

Citations

GitHub

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact