Johannes Kepler University Linz
Contrastive geometric model that unifies structure-based and ligand-based drug design in one checkpoint, enabling zero-shot virtual screening, target fishing, and pocket selection.
ConGLUDe (Contrastive Geometric Learning Unlocks Unified Structure- and Ligand-Based Drug Design) is a single model that bridges two historically separate paradigms in computational drug discovery. Structure-based methods reason about a protein's three-dimensional pocket to find or design ligands that fit it, while ligand-based methods reason from known active molecules without an explicit structure. ConGLUDe learns one shared representation in which proteins, binding pockets, and ligands are embedded together, so the same model can be queried from either direction.
Developed at the Institute for Machine Learning at Johannes Kepler University Linz (the group of Günter Klambauer, with co-authors including industry researcher Daniel Kuhn) and posted to arXiv in January 2026, ConGLUDe is trained with a contrastive objective on both protein-ligand complexes and large-scale bioactivity data. This combination lets it align the geometry of a binding site with the chemistry of its ligands while also learning from the much larger body of measured activity values that lack structural information.
The authors position ConGLUDe as a step toward general-purpose foundation models for drug discovery: rather than building a bespoke model per task, a single trained checkpoint performs several distinct tasks zero-shot, without task-specific fine-tuning.
ConGLUDe is a contrastive geometric learning model. It comprises a protein encoder that captures the geometry of a target and its candidate binding sites, and a deliberately fast ligand encoder that captures molecular structure, both trained to project into a common embedding space. The contrastive objective pulls together embeddings of cognate protein-ligand pairs (and protein/pocket-ligand pairs) while pushing apart non-binders, jointly leveraging two complementary data sources: protein-ligand complex structures, which supply geometric grounding, and large-scale bioactivity datasets, which supply broad coverage of measured activity without requiring structures. Because retrieval and ranking reduce to nearest-neighbor operations in the shared space, the same trained model addresses screening (rank ligands for a target), target fishing (rank targets for a ligand), and pocket selection (rank candidate pockets for a ligand) with no task-specific fine-tuning. Reported results include competitive zero-shot virtual screening, substantial gains on target fishing, and state-of-the-art ligand-conditioned pocket selection. As of the January 2026 preprint, no public code or model weights were available.
ConGLUDe is aimed at early-stage drug discovery teams who must repeatedly ask related questions about proteins and ligands. Medicinal chemists can run virtual screens against a target of interest; teams investigating an active compound's mechanism or off-target liabilities can use target fishing to rank plausible protein targets; and structural and computational chemists can use ligand-conditioned pocket selection to identify the most relevant binding site without manually defining pockets. Because all three capabilities come from a single checkpoint operating zero-shot, the model is well suited to exploratory settings where building and maintaining separate task-specific pipelines would be costly.
ConGLUDe's central contribution is methodological: it shows that a single contrastive geometric model can unify structure-based and ligand-based design and serve multiple discovery tasks zero-shot, lending weight to the broader push toward general-purpose foundation models in drug discovery. By learning from both structural complexes and abundant bioactivity data, it offers a pragmatic way to combine the precision of geometry with the scale of activity measurements. The key limitations are typical of a new preprint: the reported results are retrospective and benchmark-based rather than prospectively validated in the lab, and at the time of release no public code or weights were available, which constrains immediate reproduction and adoption.