bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Protein foundation models
Protein

ProtNHF

Oak Ridge National Laboratory

A neural Hamiltonian flow that generates protein sequences with continuous, inference-time control over composition and net charge via analytical bias potentials—no retraining required.

Released: March 2026

Generative models for protein sequences can sample plausible proteins, but steering them toward specific physicochemical targets—a desired amino acid composition, a particular net charge, or other compositional constraints—usually means conditioning the model during training or fine-tuning it for each new objective. ProtNHF takes a different route, framing controllable protein generation as a problem in Hamiltonian dynamics so that control can be applied entirely at inference time.

Developed at Oak Ridge National Laboratory and posted to bioRxiv in March 2026, ProtNHF (Protein Neural Hamiltonian Flows) learns a symplectic transport map that moves samples from a simple latent distribution to the space of protein embeddings. A transformer parameterizes a learned potential-energy function that, combined with a kinetic term, defines Hamiltonian dynamics integrated with a leapfrog scheme. Because the dynamics are governed by an energy function, external "bias potentials" can be added directly into the Hamiltonian at generation time to nudge sampling toward desired properties without altering or retraining the underlying model.

This places ProtNHF in the family of flow- and energy-based generative models for proteins, but with an unusual emphasis: smooth, quantitative, post-hoc control derived from the physics-inspired structure of Hamiltonian flows rather than from learned conditioning.

#Key Features

  • Inference-time controllability: Analytical bias potentials are injected into the Hamiltonian at sampling time, so properties like composition and net charge can be tuned without modifying or retraining the learned model.
  • Symplectic transport map: The model learns an energy-conserving, reversible map from a latent distribution to protein embeddings, integrated with leapfrog dynamics.
  • Transformer potential: A transformer parameterizes the potential-energy function, supplying the expressive, sequence-aware energy landscape that drives the flow.
  • Smooth property steering: Control over amino acid composition and net charge is continuous and quantitative, allowing graded rather than all-or-nothing constraints while preserving sequence diversity.

#Technical Details

ProtNHF combines a learned, transformer-based potential energy with a kinetic term to construct a Hamiltonian whose dynamics define an invertible, volume-preserving flow. Training fits this neural Hamiltonian so that deterministic leapfrog integration maps samples from a latent distribution onto the distribution of protein sequence embeddings. At inference, analytical bias functions—encoding objectives such as target amino acid composition or net charge—are added to the Hamiltonian, biasing the integrated trajectories toward sequences satisfying those constraints. The authors report that this steering maintains sequence validity and diversity while delivering continuous control, demonstrating the approach on compositional and charge targets without any retraining of the base model.

#Applications

ProtNHF is intended for protein designers who need to bias sequence generation toward specific biophysical properties—for example, tuning net charge for solubility or purification, or constraining amino acid composition to meet expression or formulation requirements. Because control is applied at inference time, a single trained model can serve many design campaigns with different objectives, which is attractive for high-throughput in silico screening and for exploratory design where targets shift frequently. The approach is most directly useful to computational protein engineers and groups exploring energy- and physics-inspired generative methods.

#Impact

ProtNHF illustrates how physics-inspired generative architectures can deliver controllability "for free" at inference, decoupling property steering from model training and avoiding the cost of retraining for each new objective. If the approach generalizes, it offers a template for adding interpretable, analytically specified constraints to protein generative models more broadly. As an early preprint without a confirmed public code or weights release, its practical performance relative to conditioned diffusion and autoregressive designers remains to be benchmarked independently, and the demonstrated controls so far focus on composition and net charge rather than structure-level objectives.

Tags

protein_designde_novo_designflow_matchingtransformergenerativerepresentation_learningproteomics