A neural Hamiltonian flow that generates protein sequences with continuous, inference-time control over composition and net charge via analytical bias potentials—no retraining required.
Generative models for protein sequences can sample plausible proteins, but steering them toward specific physicochemical targets—a desired amino acid composition, a particular net charge, or other compositional constraints—usually means conditioning the model during training or fine-tuning it for each new objective. ProtNHF takes a different route, framing controllable protein generation as a problem in Hamiltonian dynamics so that control can be applied entirely at inference time.
Developed at Oak Ridge National Laboratory and posted to bioRxiv in March 2026, ProtNHF (Protein Neural Hamiltonian Flows) learns a symplectic transport map that moves samples from a simple latent distribution to the space of protein embeddings. A transformer parameterizes a learned potential-energy function that, combined with a kinetic term, defines Hamiltonian dynamics integrated with a leapfrog scheme. Because the dynamics are governed by an energy function, external "bias potentials" can be added directly into the Hamiltonian at generation time to nudge sampling toward desired properties without altering or retraining the underlying model.
This places ProtNHF in the family of flow- and energy-based generative models for proteins, but with an unusual emphasis: smooth, quantitative, post-hoc control derived from the physics-inspired structure of Hamiltonian flows rather than from learned conditioning.
ProtNHF combines a learned, transformer-based potential energy with a kinetic term to construct a Hamiltonian whose dynamics define an invertible, volume-preserving flow. Training fits this neural Hamiltonian so that deterministic leapfrog integration maps samples from a latent distribution onto the distribution of protein sequence embeddings. At inference, analytical bias functions—encoding objectives such as target amino acid composition or net charge—are added to the Hamiltonian, biasing the integrated trajectories toward sequences satisfying those constraints. The authors report that this steering maintains sequence validity and diversity while delivering continuous control, demonstrating the approach on compositional and charge targets without any retraining of the base model.
ProtNHF is intended for protein designers who need to bias sequence generation toward specific biophysical properties—for example, tuning net charge for solubility or purification, or constraining amino acid composition to meet expression or formulation requirements. Because control is applied at inference time, a single trained model can serve many design campaigns with different objectives, which is attractive for high-throughput in silico screening and for exploratory design where targets shift frequently. The approach is most directly useful to computational protein engineers and groups exploring energy- and physics-inspired generative methods.
ProtNHF illustrates how physics-inspired generative architectures can deliver controllability "for free" at inference, decoupling property steering from model training and avoiding the cost of retraining for each new objective. If the approach generalizes, it offers a template for adding interpretable, analytically specified constraints to protein generative models more broadly. As an early preprint without a confirmed public code or weights release, its practical performance relative to conditioned diffusion and autoregressive designers remains to be benchmarked independently, and the demonstrated controls so far focus on composition and net charge rather than structure-level objectives.