A protein-ligand scoring function that conditions probabilistic geometric potentials on biomolecular language model priors, reaching state-of-the-art on CASF-2016.
Scoring functions sit at the heart of structure-based drug discovery: given a protein pocket and a candidate ligand pose, they estimate how strongly the two bind and rank poses and compounds accordingly. Classical empirical and knowledge-based potentials are fast and interpretable but often generalize poorly, while many deep-learning scorers trade interpretability and physical grounding for raw accuracy. Bridging this gap — keeping the statistical rigor of geometric potentials while borrowing the representational power of modern learned encoders — remains an open challenge.
BioLM-Score, introduced by Yang and colleagues at Shenzhen University in a February 2026 arXiv preprint, addresses this by conditioning probabilistic geometric potentials on priors from biomolecular language models. The method pairs modality-specific, structure-aware encoders for the protein and the ligand with language-model embeddings, and feeds the resulting representations into a mixture density network that predicts multimodal distributions over interatomic distances. Binding scores are then derived as likelihood-based statistics from these learned distance distributions, rather than from a black-box regression head.
This design yields a scoring function that the authors describe as principled and practical — grounded in distance statistics, informed by language-model priors, and competitive with or better than existing approaches on standard benchmarks.
BioLM-Score combines structure-aware, modality-specific encoders for protein and ligand with embeddings from biomolecular language models, then passes the joint representation to a mixture density network that models multimodal distributions over interatomic distances. Binding scores are computed as statistically grounded likelihood-based quantities from these distributions. The authors benchmark the method on CASF-2016, the widely used Comparative Assessment of Scoring Functions test set, reporting improvements across multiple drug-discovery-relevant tasks (such as scoring, ranking, and related power metrics) and characterizing the result as state of the art among the compared scoring functions. Exact parameter counts, the specific language models used, and full hyperparameters await the complete release; as a recent preprint, no code or trained weights have been published yet.
BioLM-Score targets computational chemists and structure-based drug-discovery teams who need to rank docked poses and prioritize compounds by predicted binding. A more accurate and interpretable scoring function improves the reliability of virtual screening, pose selection in docking pipelines, and lead-optimization triage, where small differences in predicted affinity translate into which molecules advance to synthesis and assay. Because scores are grounded in interatomic distance statistics, the method may also offer diagnostic insight into which contacts drive a predicted ranking.
By framing protein-ligand scoring as language-prior-conditioned probabilistic geometric potential estimation, BioLM-Score offers a route to combine the interpretability of distance-based potentials with the generalization of learned representations, an attractive middle ground between classical and purely data-driven scorers. Its reported state-of-the-art CASF-2016 results position it as a candidate component for screening pipelines, but as a February 2026 preprint without released code or weights, those results await peer review and independent reproduction before the method can be adopted in production workflows.