bio.rodeo
HomeCompetitorsLeaderboardOrganizations
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

© 2026 bio.rodeo. All rights reserved.
Protein

OpenCRISPR-1

Profluent

The first AI-designed CRISPR-Cas gene editor to successfully edit the human genome, generated by protein language models trained on over 1.2 million CRISPR operons and released open-source.

Released: 2024

Overview

OpenCRISPR-1 is the first AI-designed CRISPR-Cas gene editor demonstrated to successfully edit the human genome, developed by Profluent using large protein language models trained on the largest curated collection of CRISPR-associated protein sequences assembled to date. Originally released as a bioRxiv preprint in April 2024 and subsequently published in Nature in 2025, OpenCRISPR-1 represents a landmark demonstration of what generative AI can produce in protein engineering: a functional gene editor that maintains the architecture of a Type II CRISPR-Cas9 nuclease but differs from the canonical SpCas9 by more than 400 amino acid mutations — and differs from every known natural CRISPR-associated protein by at least 200 mutations. The protein is, in the precise technical sense, new-to-nature: no natural evolution produced it, and its sequence could not have been identified by searching existing genomic databases.

The development of OpenCRISPR-1 emerged from a recognition that the natural diversity of CRISPR-Cas systems, while remarkable, represents only a small fraction of the functional protein sequence space accessible to Cas-like architectures. SpCas9, the dominant gene editor in research and therapeutic development, was identified from Streptococcus pyogenes and has been optimized for decades of directed evolution and rational engineering — yet it carries limitations including immunogenicity in human patients (who frequently carry antibodies against Streptococcus proteins), a subset of off-target cleavage events, and PAM sequence constraints that limit the genomic sites accessible for editing. Alternative natural CRISPR editors — SaCas9, LbCas12a, AsCas12a, and others — offer different trade-offs but remain constrained by the same fundamental limitation: they represent whatever evolution produced in the particular ecological niches occupied by the source organisms, not the full space of functional architectures that a Cas-family protein could occupy.

Profluent approached this problem by training protein language models on a massive, custom dataset of CRISPR operons mined from microbial genomes, then using generative sampling to explore the much larger space of plausible but unobserved Cas-family protein sequences. From 350,000 generated synthetic sequences filtered for compatibility with CRISPR system requirements, 209 candidates were selected for functional testing in human cells. OpenCRISPR-1 is one of these candidates — notable for combining on-target editing efficiency comparable to SpCas9 with a 95% reduction in off-target cleavage and substantially lower immunogenicity, a combination of properties that no single natural CRISPR editor possesses. Profluent released OpenCRISPR-1 as open-source under a permissive license allowing both research and commercial use, with the explicit goal of enabling broad access to AI-designed gene editing tools.

Key Features

  • More than 400 mutations from SpCas9: OpenCRISPR-1 maintains the prototypical bilobed architecture of a Type II Cas9 nuclease — recognition lobe plus nuclease lobe — but diverges from SpCas9 by over 400 amino acid mutations and from any known natural CRISPR-associated protein by at least 200 mutations, representing a genuinely novel sequence identity in functional protein space.
  • Comparable on-target editing efficiency: In human cell editing assays, OpenCRISPR-1 achieves 55.7% mean on-target editing efficiency compared to 48.3% for SpCas9 — demonstrating that AI-designed proteins can match or exceed the performance of natural proteins optimized over millions of years of evolution.
  • 95% reduction in off-target cleavage: OpenCRISPR-1 demonstrates 0.32% off-target editing compared to 6.1% for SpCas9, a 95% reduction that reflects improved specificity rather than reduced nuclease activity — and its off-target sites are a strict subset of SpCas9's, with no new off-target sites introduced by the AI design process.
  • Reduced immunogenicity: Antibody assays across 40 human donors show consistently lower immune reactivity to OpenCRISPR-1 compared to SpCas9, and T-cell epitope analysis indicates the absence of known SpCas9 immunogenic epitopes — a clinically significant property for in vivo therapeutic applications where pre-existing immunity to bacterial Cas proteins can limit efficacy and safety.
  • NGG PAM compatibility with SpCas9 guide RNAs: OpenCRISPR-1 retains the NGG PAM preference of SpCas9 and can function with canonical SpCas9 guide RNAs, making it a near-drop-in replacement for many existing experimental protocols without requiring redesign of guide RNA libraries or delivery systems.
  • Open-source release: Model sequences, protein sequences, and guide RNA sequences for OpenCRISPR-1 are freely available under a permissive license for both research and commercial ethical use, with tens of thousands of researchers having accessed the sequence since its initial release.

Technical Details

The computational pipeline underlying OpenCRISPR-1 begins with large-scale mining of CRISPR-Cas sequences from microbial genomic databases. Profluent curated the CRISPR-Cas Atlas by mining 26.2 terabases of assembled microbial genomes and metagenomes, yielding a dataset of over 1.2 million CRISPR operons and more than 240,000 Cas9-family sequences. This dataset represents a substantially more comprehensive survey of natural CRISPR diversity than was previously available and was specifically designed to capture the evolutionary diversity of Cas protein architectures across bacteria and archaea at scale.

The generative model used to produce OpenCRISPR-1 is a large protein language model in the ProGen2 family — an autoregressive transformer trained on broad protein sequence databases that was subsequently fine-tuned specifically on the CRISPR-Cas Atlas sequences. Fine-tuning on Cas-specific sequences provides the model with a prior over the statistical regularities of functional Cas architectures: conserved catalytic residues, structural domain boundaries, guide RNA interaction interfaces, and PAM recognition elements. Once fine-tuned, the model was used to generate 350,000 synthetic Cas9-like sequences by autoregressive sampling, conditioned on partial sequences or sequence motifs that specify key functional requirements.

The 350,000 generated sequences were filtered through a computational pipeline assessing sequence quality, predicted structural plausibility, and CRISPR system compatibility — resulting in 209 candidate proteins selected for experimental characterization. These candidates were synthesized and transfected into human cells, where their editing efficiency at target genomic sites was measured by sequencing. OpenCRISPR-1 was identified from this screen as a high-performing candidate with a combination of on-target efficiency, off-target specificity, and predicted structural characteristics warranting further detailed characterization. Off-target analysis was performed using GUIDE-seq, a genome-wide method for detecting double-strand breaks, providing an unbiased assessment of editing specificity across the human genome rather than a limited panel of computationally predicted sites. Immunogenicity was assessed through iELISA quantification across 40 human donors representing diverse population backgrounds, with OpenCRISPR-1 showing consistently lower antibody reactivity than SpCas9.

The Nature 2025 publication extends the original preprint with additional experimental results, including enhanced variants with further improved editing characteristics, demonstrating that the generative AI approach enables iterative optimization — not just discovery of a single candidate, but a platform for exploring the functional landscape of Cas-like proteins in a directed manner.

Applications

OpenCRISPR-1 is immediately applicable to any research or therapeutic context that currently uses SpCas9 or other Type II CRISPR nucleases, with the advantage of improved specificity and potentially improved immunogenicity characteristics. For basic research laboratories, the NGG PAM compatibility and guide RNA compatibility with SpCas9 protocols means that OpenCRISPR-1 can be adopted without redesigning existing guide RNA libraries or workflows — lowering the barrier to adoption for groups that want to benefit from reduced off-target editing. In gene therapy development, the reduced immunogenicity profile of OpenCRISPR-1 addresses one of the most significant clinical challenges facing CRISPR-based therapeutics: pre-existing immunity against SpCas9 in a substantial fraction of the human population limits the patient populations that can safely receive SpCas9-based treatments. An AI-designed editor with lower immunogenicity could extend the eligible patient population for CRISPR therapeutics currently in clinical development. The 95% off-target reduction is particularly relevant for therapeutic applications where unintended edits at non-target genomic sites represent safety risks that must be minimized. Agricultural biotechnology and industrial biotechnology researchers can apply OpenCRISPR-1 as a high-specificity genome editing tool in non-human organisms, where the protein's divergence from known natural proteins may provide advantages in contexts where natural CRISPR editors perform inconsistently.

Impact

OpenCRISPR-1 is a landmark result in AI-driven protein design because it demonstrates, for the first time, that generative protein language models can produce functional gene editing machinery of sufficient quality to edit the human genome with performance metrics that surpass the natural prototype in clinically relevant dimensions. The 95% off-target reduction relative to SpCas9 is not a marginal improvement — it is the difference between a research tool and a potential therapeutic agent for many applications. The open-source release under a permissive license distinguishes OpenCRISPR-1 from many AI-designed proteins that remain proprietary, and the tens of thousands of researchers who accessed the sequence within the first year of release reflect genuine demand for accessible, high-quality AI-designed biological tools. The work also establishes a generalizable platform: the combination of large-scale CRISPR operon mining, protein language model fine-tuning, and high-throughput functional screening is not specific to SpCas9 and could in principle be extended to design editors with novel PAM specificities, altered target range, activity in new cellular contexts, or other properties not easily accessible through natural CRISPR diversity. Profluent's subsequent work on this platform, including expanded functional testing data in the Nature publication, suggests that the initial OpenCRISPR-1 result is the first output of a broader enterprise of AI-driven gene editor design. A key limitation is that the model and screening pipeline optimize for a combination of efficiency and specificity measurable in standard cell line assays, and performance in the in vivo contexts relevant to gene therapy — including delivery constraints, chromatin accessibility variation, and tissue-specific expression — requires additional validation.

Tags

protein designde novo designtransformergenerativelanguage modelfoundation modelDNAgenomics

Resources

GitHub RepositoryResearch PaperOfficial Website