Profluent
The first AI-designed CRISPR-Cas gene editor to successfully edit the human genome, generated by protein language models trained on over 1.2 million CRISPR operons and released open-source.
OpenCRISPR-1 is the first AI-designed CRISPR-Cas gene editor demonstrated to successfully edit the human genome, developed by Profluent using large protein language models trained on the largest curated collection of CRISPR-associated protein sequences assembled to date. Originally released as a bioRxiv preprint in April 2024 and subsequently published in Nature in 2025, OpenCRISPR-1 represents a landmark demonstration of what generative AI can produce in protein engineering: a functional gene editor that maintains the architecture of a Type II CRISPR-Cas9 nuclease but differs from the canonical SpCas9 by more than 400 amino acid mutations — and differs from every known natural CRISPR-associated protein by at least 200 mutations. The protein is, in the precise technical sense, new-to-nature: no natural evolution produced it, and its sequence could not have been identified by searching existing genomic databases.
The development of OpenCRISPR-1 emerged from a recognition that the natural diversity of CRISPR-Cas systems, while remarkable, represents only a small fraction of the functional protein sequence space accessible to Cas-like architectures. SpCas9, the dominant gene editor in research and therapeutic development, was identified from Streptococcus pyogenes and has been optimized for decades of directed evolution and rational engineering — yet it carries limitations including immunogenicity in human patients (who frequently carry antibodies against Streptococcus proteins), a subset of off-target cleavage events, and PAM sequence constraints that limit the genomic sites accessible for editing. Alternative natural CRISPR editors — SaCas9, LbCas12a, AsCas12a, and others — offer different trade-offs but remain constrained by the same fundamental limitation: they represent whatever evolution produced in the particular ecological niches occupied by the source organisms, not the full space of functional architectures that a Cas-family protein could occupy.
Profluent approached this problem by training protein language models on a massive, custom dataset of CRISPR operons mined from microbial genomes, then using generative sampling to explore the much larger space of plausible but unobserved Cas-family protein sequences. From 350,000 generated synthetic sequences filtered for compatibility with CRISPR system requirements, 209 candidates were selected for functional testing in human cells. OpenCRISPR-1 is one of these candidates — notable for combining on-target editing efficiency comparable to SpCas9 with a 95% reduction in off-target cleavage and substantially lower immunogenicity, a combination of properties that no single natural CRISPR editor possesses. Profluent released OpenCRISPR-1 as open-source under a permissive license allowing both research and commercial use, with the explicit goal of enabling broad access to AI-designed gene editing tools.
The computational pipeline underlying OpenCRISPR-1 begins with large-scale mining of CRISPR-Cas sequences from microbial genomic databases. Profluent curated the CRISPR-Cas Atlas by mining 26.2 terabases of assembled microbial genomes and metagenomes, yielding a dataset of over 1.2 million CRISPR operons and more than 240,000 Cas9-family sequences. This dataset represents a substantially more comprehensive survey of natural CRISPR diversity than was previously available and was specifically designed to capture the evolutionary diversity of Cas protein architectures across bacteria and archaea at scale.
The generative model used to produce OpenCRISPR-1 is a large protein language model in the ProGen2 family — an autoregressive transformer trained on broad protein sequence databases that was subsequently fine-tuned specifically on the CRISPR-Cas Atlas sequences. Fine-tuning on Cas-specific sequences provides the model with a prior over the statistical regularities of functional Cas architectures: conserved catalytic residues, structural domain boundaries, guide RNA interaction interfaces, and PAM recognition elements. Once fine-tuned, the model was used to generate 350,000 synthetic Cas9-like sequences by autoregressive sampling, conditioned on partial sequences or sequence motifs that specify key functional requirements.
The 350,000 generated sequences were filtered through a computational pipeline assessing sequence quality, predicted structural plausibility, and CRISPR system compatibility — resulting in 209 candidate proteins selected for experimental characterization. These candidates were synthesized and transfected into human cells, where their editing efficiency at target genomic sites was measured by sequencing. OpenCRISPR-1 was identified from this screen as a high-performing candidate with a combination of on-target efficiency, off-target specificity, and predicted structural characteristics warranting further detailed characterization. Off-target analysis was performed using GUIDE-seq, a genome-wide method for detecting double-strand breaks, providing an unbiased assessment of editing specificity across the human genome rather than a limited panel of computationally predicted sites. Immunogenicity was assessed through iELISA quantification across 40 human donors representing diverse population backgrounds, with OpenCRISPR-1 showing consistently lower antibody reactivity than SpCas9.
The Nature 2025 publication extends the original preprint with additional experimental results, including enhanced variants with further improved editing characteristics, demonstrating that the generative AI approach enables iterative optimization — not just discovery of a single candidate, but a platform for exploring the functional landscape of Cas-like proteins in a directed manner.
OpenCRISPR-1 is immediately applicable to any research or therapeutic context that currently uses SpCas9 or other Type II CRISPR nucleases, with the advantage of improved specificity and potentially improved immunogenicity characteristics. For basic research laboratories, the NGG PAM compatibility and guide RNA compatibility with SpCas9 protocols means that OpenCRISPR-1 can be adopted without redesigning existing guide RNA libraries or workflows — lowering the barrier to adoption for groups that want to benefit from reduced off-target editing. In gene therapy development, the reduced immunogenicity profile of OpenCRISPR-1 addresses one of the most significant clinical challenges facing CRISPR-based therapeutics: pre-existing immunity against SpCas9 in a substantial fraction of the human population limits the patient populations that can safely receive SpCas9-based treatments. An AI-designed editor with lower immunogenicity could extend the eligible patient population for CRISPR therapeutics currently in clinical development. The 95% off-target reduction is particularly relevant for therapeutic applications where unintended edits at non-target genomic sites represent safety risks that must be minimized. Agricultural biotechnology and industrial biotechnology researchers can apply OpenCRISPR-1 as a high-specificity genome editing tool in non-human organisms, where the protein's divergence from known natural proteins may provide advantages in contexts where natural CRISPR editors perform inconsistently.
OpenCRISPR-1 is a landmark result in AI-driven protein design because it demonstrates, for the first time, that generative protein language models can produce functional gene editing machinery of sufficient quality to edit the human genome with performance metrics that surpass the natural prototype in clinically relevant dimensions. The 95% off-target reduction relative to SpCas9 is not a marginal improvement — it is the difference between a research tool and a potential therapeutic agent for many applications. The open-source release under a permissive license distinguishes OpenCRISPR-1 from many AI-designed proteins that remain proprietary, and the tens of thousands of researchers who accessed the sequence within the first year of release reflect genuine demand for accessible, high-quality AI-designed biological tools. The work also establishes a generalizable platform: the combination of large-scale CRISPR operon mining, protein language model fine-tuning, and high-throughput functional screening is not specific to SpCas9 and could in principle be extended to design editors with novel PAM specificities, altered target range, activity in new cellular contexts, or other properties not easily accessible through natural CRISPR diversity. Profluent's subsequent work on this platform, including expanded functional testing data in the Nature publication, suggests that the initial OpenCRISPR-1 result is the first output of a broader enterprise of AI-driven gene editor design. A key limitation is that the model and screening pipeline optimize for a combination of efficiency and specificity measurable in standard cell line assays, and performance in the in vivo contexts relevant to gene therapy — including delivery constraints, chromatin accessibility variation, and tissue-specific expression — requires additional validation.