bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Small molecule foundation models
Small molecule

SELFormerMM

Hacettepe University

Multimodal molecular foundation model fusing SELFIES sequences, 2D graphs, text descriptions, and knowledge-graph embeddings via contrastive pretraining for property prediction.

Released: March 2026

Molecular representation learning underpins much of modern computational drug discovery, where the goal is to predict properties such as toxicity, solubility, or bioactivity from a molecule's structure. Most existing models, however, rely on a single view of a molecule, typically a sequence notation like SMILES or a 2D structural graph, and therefore miss complementary information available in other modalities such as natural-language descriptions or curated biological knowledge graphs.

SELFormerMM, developed by the HUBioDataLab at Hacettepe University and posted in March 2026, is a multimodal molecular foundation model that integrates four modalities: SELFIES sequence notations, 2D structural graphs, textual descriptions, and knowledge-graph-derived biological interaction data. It extends the earlier SELFormer chemical language model by aligning these modality-specific representations through contrastive pretraining on roughly three million molecules, producing embeddings that downstream models can use for property prediction.

By fusing chemical structure with biological context drawn from a knowledge graph, SELFormerMM aims to capture aspects of molecules that pure structure-based models overlook, and the authors report improved performance over single-modality alternatives on molecular property tasks.

#Key Features

  • Four-modality integration: Combines SELFIES sequences, 2D structural graphs, text descriptions, and knowledge-graph embeddings in a shared representation space.
  • Contrastive alignment: A supervised contrastive objective aligns the modality-specific branches so that complementary signals reinforce one another.
  • Built on SELFormer: The sequence branch reuses the RoBERTa-based SELFormer chemical language model, extending an established backbone rather than training from scratch.
  • Biological knowledge grounding: Knowledge-graph embeddings from a biological interaction graph inject context beyond chemical structure, such as known target relationships.
  • Open code and checkpoints: Code is released under GPL-3.0 with a pretrained checkpoint and datasets distributed through HuggingFace.

#Technical Details

SELFormerMM uses four modality-specific branches projected into a shared 768-dimensional space. The sequence branch is SELFormer (a RoBERTa-based model over SELFIES); the text branch uses frozen SciBERT embeddings; the structure branch uses frozen Uni-Mol features from 3D conformers; and the knowledge-graph branch uses DMGI embeddings derived from the CROssBARv2 biological knowledge graph. Non-linear MLP projection heads align the modalities via a supervised contrastive (SINCERE) loss during pretraining on approximately three million molecules. For downstream prediction, concatenated multimodal embeddings feed task heads covering binary classification, multilabel classification, and regression. The authors report gains over single-modality baselines on molecular property benchmarks. Code is licensed under GPL-3.0, and the preprint is released under CC-BY.

#Applications

SELFormerMM targets molecular property prediction tasks central to early-stage drug discovery, including ADMET-style endpoints, toxicity, and bioactivity classification and regression. Cheminformatics and machine-learning researchers can fine-tune the pretrained model or use its multimodal embeddings as features for their own predictors. The knowledge-graph component makes it particularly relevant when biological context, such as known interactions, is expected to inform a molecule's behavior beyond its chemical structure alone.

#Impact

SELFormerMM contributes to a growing line of multimodal molecular models that move past single-view representations by fusing structure with text and curated biological knowledge. Its open code and released checkpoints make it straightforward for the cheminformatics community to reproduce and extend. As a recent preprint, its reported advantages over single-modality baselines await independent benchmarking, and its reliance on several frozen external encoders means performance is partly bounded by those upstream components.

Tags

molecular_property_predictiondrug_discoverytransformergraph_neural_networkfoundation_modelcontrastive_learningmultimodalcheminformatics