BioLM-Score

Protein-ligand scoring function that conditions probabilistic geometric potentials on language model priors to rank docked poses and binding affinity.

Released: February 2026

Scoring functions sit at the heart of structure-based drug discovery: given a protein pocket and a candidate ligand pose, they estimate how strongly the two bind and rank poses and compounds accordingly. Classical empirical and knowledge-based potentials are fast and interpretable but often generalize poorly, while many deep-learning scorers trade interpretability and physical grounding for raw accuracy. Bridging this gap — keeping the statistical rigor of geometric potentials while borrowing the representational power of modern learned encoders — remains an open challenge.

BioLM-Score, introduced by Yang and colleagues at Shenzhen University in a February 2026 arXiv preprint, addresses this by conditioning probabilistic geometric potentials on priors from biomolecular language models. The method pairs modality-specific, structure-aware encoders for the protein and the ligand with language-model embeddings, and feeds the resulting representations into a mixture density network that predicts multimodal distributions over interatomic distances. Binding scores are then derived as likelihood-based statistics from these learned distance distributions, rather than from a black-box regression head.

This design yields a scoring function that the authors describe as principled and practical — grounded in distance statistics, informed by language-model priors, and competitive with or better than existing approaches on standard benchmarks.

Key Features

Language-prior conditioning: Biomolecular language model embeddings inform structure-aware encoders, injecting evolutionary and chemical priors into the geometric scoring model.
Probabilistic geometric potentials: A mixture density network predicts multimodal interatomic distance distributions, from which likelihood-based binding scores are derived in a statistically grounded way.
Modality-specific, structure-aware encoders: Separate encoders for the protein and the ligand capture each modality's geometry before they are combined for scoring.
Interpretable and efficient: By grounding scores in distance likelihoods rather than opaque regression, the method aims to combine interpretability and computational efficiency with strong generalization.

Technical Details

BioLM-Score combines structure-aware, modality-specific encoders for protein and ligand with embeddings from biomolecular language models, then passes the joint representation to a mixture density network that models multimodal distributions over interatomic distances. Binding scores are computed as statistically grounded likelihood-based quantities from these distributions. The authors benchmark the method on CASF-2016, the widely used Comparative Assessment of Scoring Functions test set, reporting improvements across multiple drug-discovery-relevant tasks (such as scoring, ranking, and related power metrics) and characterizing the result as state of the art among the compared scoring functions. Exact parameter counts, the specific language models used, and full hyperparameters await the complete release; as a recent preprint, no code or trained weights have been published yet.

Applications

BioLM-Score targets computational chemists and structure-based drug-discovery teams who need to rank docked poses and prioritize compounds by predicted binding. A more accurate and interpretable scoring function improves the reliability of virtual screening, pose selection in docking pipelines, and lead-optimization triage, where small differences in predicted affinity translate into which molecules advance to synthesis and assay. Because scores are grounded in interatomic distance statistics, the method may also offer diagnostic insight into which contacts drive a predicted ranking.

Impact

By framing protein-ligand scoring as language-prior-conditioned probabilistic geometric potential estimation, BioLM-Score offers a route to combine the interpretability of distance-based potentials with the generalization of learned representations, an attractive middle ground between classical and purely data-driven scorers. Its reported state-of-the-art CASF-2016 results position it as a candidate component for screening pipelines, but as a February 2026 preprint without released code or weights, those results await peer review and independent reproduction before the method can be adopted in production workflows.

Citation

BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring

Preprint

Yang, Z., et al. (2026) BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring. arXiv.org.

DOI: 10.48550/arXiv.2602.18476

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations0

Influential0

References43

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility

11Closed

Usability — can I run it?7

Reproducibility — can I retrain it?14

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

Research Paper

Key Features

Language-prior conditioning: Biomolecular language model embeddings inform structure-aware encoders, injecting evolutionary and chemical priors into the geometric scoring model.

Probabilistic geometric potentials: A mixture density network predicts multimodal interatomic distance distributions, from which likelihood-based binding scores are derived in a statistically grounded way.

Modality-specific, structure-aware encoders: Separate encoders for the protein and the ligand capture each modality's geometry before they are combined for scoring.

Interpretable and efficient: By grounding scores in distance likelihoods rather than opaque regression, the method aims to combine interpretability and computational efficiency with strong generalization.

Technical Details

Applications

Impact

BioLM-Score

Key Features

Technical Details

Applications

Impact

Citation

BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

BioLM-Score

Key Features

Technical Details

Applications

Impact

Citation

BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring

Recent citations

Top citations

Citations

Fields of citing research

Openness

Tags

Resources

BioLM-Score

#Key Features

#Technical Details

#Applications

#Impact

Citation

BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

BioLM-Score

#Key Features

#Technical Details

#Applications

#Impact

Citation

BioLM-Score: Language-Prior Conditioned Probabilistic Geometric Potentials for Protein-Ligand Scoring

Recent citations

Top citations

Related models

Citations

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact