bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Small molecule foundation models
Small moleculeProtein

UBio-MolFM

IQuestLab

A universal all-atom machine-learning force field foundation model with ab initio-level accuracy on solvated biomolecular systems up to ~1,500 atoms.

Released: February 2026

Machine-learning interatomic potentials (MLIPs) promise the holy grail of molecular simulation: the accuracy of quantum-mechanical methods like density functional theory (DFT) at a fraction of the cost. For biology, though, a persistent "scale–accuracy gap" remains — the systems that matter (solvated proteins, ions, peptides) are large and heterogeneous, while the high-fidelity quantum data needed to train accurate force fields is easiest to generate for small, gas-phase molecules. Closing that gap requires both the right training data and an architecture that can stay accurate as systems grow.

UBio-MolFM, developed by the UBio Team at IQuestLab (IQuest Research) and posted to arXiv in February 2026, is an all-atom machine-learning force field foundation model built specifically for biological systems. It combines a new bio-focused training dataset (UBio-Mol26), a linear-scaling equivariant transformer backbone (E2Former-V2), and a three-stage curriculum-learning protocol, with the explicit goal of delivering ab initio-level fidelity on solvated, biomolecule-scale systems.

The authors report ab initio-level accuracy on biomolecular systems reaching roughly 1,500 atoms across a range of benchmarks — including liquid-water structure, ionic solvation, and peptide folding dynamics — while reproducing realistic molecular-dynamics observables. Code and pretrained checkpoints are released, distinguishing UBio-MolFM from many contemporaneous preprints.

#Key Features

  • Bio-focused training data (UBio-Mol26): Built with a multi-fidelity "two-pronged" strategy combining systematic enumeration with sampling of native protein environments, covering systems up to ~1,200 atoms.
  • Linear-scaling equivariant transformer (E2Former-V2): An equivariant architecture with sparsification and long-/short-range modeling that the authors report achieves roughly 4x higher inference throughput on large systems.
  • Three-stage curriculum learning: Training transitions from energy initialization through force-consistency refinement, with force-focused supervision to address energy-offset issues.
  • Ab initio-level accuracy at scale: Validated on liquid water, ionic solvation, and peptide folding for systems up to ~1,500 atoms, with realistic MD observables.
  • Released code and weights: Implementation (MIT-licensed), a pretrained checkpoint (IQuest-UBio-MolFM-V1), and a protein dataset (UBio-Protein26) are publicly available.

#Technical Details

UBio-MolFM pairs the E2Former-V2 backbone — an equivariant transformer with linear scaling in system size — with the UBio-Mol26 dataset, generated via a multi-fidelity enumeration plus native-environment sampling strategy spanning systems up to ~1,200 atoms. Training follows a three-stage curriculum (energy initialization, force-consistency refinement, and force-focused supervision) intended to stabilize learning and handle energy offsets. The released suite supports training, inference, and molecular-dynamics simulation, targets Python 3.12 / PyTorch 2.7.0, and supports LMDB, SPICE, and OC20 data formats; the README notes scaling to roughly 100,000 atoms on a single GPU. Reported benchmarks reach ab initio-level accuracy on biomolecular systems near 1,500 atoms. An accompanying HuggingFace dataset (UBio-Protein26, ~5 million protein structures) serves as a data card for the released checkpoint.

#Applications

UBio-MolFM targets computational chemists and structural biologists who need accurate, scalable molecular dynamics of biomolecular systems — for example, simulating peptide folding, ion solvation, or protein–solvent interactions where classical force fields lack accuracy and DFT is intractable. Because the model is released with training, inference, and MD tooling, it can be deployed directly to run simulations or fine-tuned on domain-specific quantum data, lowering the barrier to first-principles-quality dynamics for larger biological assemblies.

#Impact

By coupling a bio-specific multi-fidelity dataset with a linear-scaling equivariant architecture and a force-focused curriculum, UBio-MolFM is a concrete attempt to push machine-learning force fields from small molecules toward solvated, biomolecule-scale simulation. The public release of code, weights, and a large protein dataset makes the work immediately testable by the community. As a February 2026 preprint, its accuracy claims await independent benchmarking and peer review, but the combination of openness and explicit focus on biological scale is a meaningful step for the MLIP field.

Tags

molecular_dynamicsforce_fieldenergy_predictionequivariant_transformerfoundation_modelcurriculum_learningmolecular_simulationbiomolecules