ATOM-1 is the first RNA foundation model trained on chemical mapping data, developed by Atomic AI and introduced in December 2023. Unlike prior RNA models that learn exclusively from sequence databases of naturally occurring RNA, ATOM-1 is pre-trained on large-scale experimental readouts — measurements of how chemical reagents modify individual nucleotides in a structure-dependent manner. This approach gives the model direct exposure to the conformational states that RNA adopts in solution and in cells, embedding physical information that sequence data alone cannot capture.
The central challenge motivating ATOM-1 is the rational design of RNA-based medicines. mRNA vaccines, small interfering RNAs (siRNAs), and circular RNAs are increasingly important therapeutic modalities, but optimizing these molecules through iterative experimental screening is slow and expensive. Accurate computational models that predict RNA structure and function from sequence could dramatically accelerate this design cycle. ATOM-1 addresses this bottleneck by providing rich sequence embeddings that can be rapidly adapted to diverse downstream RNA prediction tasks using small probe neural networks trained on limited additional data.
ATOM-1 was developed by a team at Atomic AI led by Raphael J. L. Townshend, the researcher also behind the geometric deep learning system ATOM3D. The preprint describes both the data collection strategy — specifically designed for ML-scale training — and benchmark evaluations across secondary structure, tertiary structure, and mRNA stability prediction. As of the preprint's posting, ATOM-1 had not undergone peer review.
ATOM-1 is a structure-aware encoder-decoder transformer trained on data collected via next-generation sequencing (NGS) readout of chemical probing experiments. The dataset encompasses millions of RNA sequences and over one billion nucleotide-level measurements, generated through custom wet-lab assays developed specifically for ML-scale training — a scale of experimental supervision not previously applied to RNA foundation models. The encoder produces two representations: a per-nucleotide single representation (n x 512) and an all-pairs representation (n x n x 256), the latter being particularly important for capturing base-pairing and long-range structural contacts.
Benchmark evaluations compare ATOM-1 probe networks to RNAfold, CONTRAFold, and RNA-FM (a sequence-only RNA language model) across three secondary structure datasets: PDB-derived structures, ArchiveII, and bpRNA-1m TS0. ATOM-1 probes are competitive with or superior to the physics-inspired thermodynamic methods and substantially outperform RNA-FM probes, demonstrating that chemical mapping pre-training encodes structural information beyond what sequence co-evolution alone provides. For tertiary structure and mRNA stability, ATOM-1 similarly achieves top-ranked performance against existing methods. Exact parameter counts and full training hyperparameters are not disclosed in the preprint.
ATOM-1 is designed primarily for therapeutic RNA development. Researchers can use the model's embeddings as input features for predicting RNA secondary and tertiary structure, in-solution stability of mRNA constructs, and the activity of siRNAs and other RNA therapeutics. The probe-based adaptation framework means that teams with relatively small labeled datasets — such as company-internal experimental screens — can develop accurate, task-specific predictors without large-scale fine-tuning infrastructure. In vaccine development contexts, accurate mRNA stability prediction directly informs sequence optimization decisions that affect immunogenicity and shelf life.
ATOM-1 represents a methodological shift in how RNA foundation models are built, establishing that experimental chemical mapping data — not just genomic sequence — can and should be used for pre-training. The benchmark results, particularly the top ranking across 1,600 methods in the mRNA stability analysis, provide concrete evidence that this data modality confers meaningful advantages. As of the preprint's release, Atomic AI positioned ATOM-1 as a platform component of their proprietary drug discovery pipeline, meaning the model weights and training data are not publicly released — a notable limitation for academic reuse. The work nonetheless sets an important precedent for the field, and it is likely to motivate broader adoption of experimental signal in RNA model pre-training, analogous to how AlphaFold 2 demonstrated the value of co-evolutionary data for protein structure prediction.
Boyd, N., Anderson, B. M., Townshend, B., et al. (2023). ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data. bioRxiv, 2023.12.13.571579.
DOI: 10.1101/2023.12.13.571579