Overview

ATOM-1 is the first RNA foundation model trained on chemical mapping data, developed by Atomic AI and introduced in December 2023. Unlike prior RNA models that learn exclusively from sequence databases of naturally occurring RNA, ATOM-1 is pre-trained on large-scale experimental readouts — measurements of how chemical reagents modify individual nucleotides in a structure-dependent manner. This approach gives the model direct exposure to the conformational states that RNA adopts in solution and in cells, embedding physical information that sequence data alone cannot capture.

The central challenge motivating ATOM-1 is the rational design of RNA-based medicines. mRNA vaccines, small interfering RNAs (siRNAs), and circular RNAs are increasingly important therapeutic modalities, but optimizing these molecules through iterative experimental screening is slow and expensive. Accurate computational models that predict RNA structure and function from sequence could dramatically accelerate this design cycle. ATOM-1 addresses this bottleneck by providing rich sequence embeddings that can be rapidly adapted to diverse downstream RNA prediction tasks using small probe neural networks trained on limited additional data.

ATOM-1 was developed by a team at Atomic AI led by Raphael J. L. Townshend, the researcher also behind the geometric deep learning system ATOM3D. The preprint describes both the data collection strategy — specifically designed for ML-scale training — and benchmark evaluations across secondary structure, tertiary structure, and mRNA stability prediction. As of the preprint's posting, ATOM-1 had not undergone peer review.

Key Features

Chemical mapping pre-training: ATOM-1 is trained on data from millions of RNA sequences with over one billion nucleotide-level measurements obtained through in-house chemical probing experiments (using reagents such as DMS and SHAPE), providing a direct experimental grounding that sequence-only models lack.
Probe-based adaptation: Rather than fine-tuning the full model, downstream tasks are addressed using small single-hidden-layer MLP probe networks trained on ATOM-1 embeddings, enabling state-of-the-art accuracy on multiple RNA prediction tasks with limited labeled data.
Dual representation output: For an RNA sequence of length n, ATOM-1's encoder produces two structured representations — a single representation of shape n x 512 capturing per-nucleotide features, and a pair representation of shape n x n x 256 capturing pairwise nucleotide relationships — providing rich input for structural inference tasks.
Pseudoknot-aware secondary structure prediction: Probe networks trained on ATOM-1 embeddings can predict complex secondary structure elements including pseudoknots, which thermodynamic methods such as RNAfold are fundamentally unable to handle.
Broad RNA modality coverage: The model supports prediction tasks across structurally and functionally distinct RNA classes, including mRNA, siRNA, and circular RNA, making it relevant to a wide range of therapeutic design contexts.
State-of-the-art mRNA stability prediction: In a retrospective benchmarking analysis, an ATOM-1-derived predictor outperformed all 1,600 competing methods entered in a vaccine design challenge for predicting in-solution mRNA stability.

Technical Details

ATOM-1 is a structure-aware encoder-decoder transformer trained on data collected via next-generation sequencing (NGS) readout of chemical probing experiments. The dataset encompasses millions of RNA sequences and over one billion nucleotide-level measurements, generated through custom wet-lab assays developed specifically for ML-scale training — a scale of experimental supervision not previously applied to RNA foundation models. The encoder produces two representations: a per-nucleotide single representation (n x 512) and an all-pairs representation (n x n x 256), the latter being particularly important for capturing base-pairing and long-range structural contacts.

Benchmark evaluations compare ATOM-1 probe networks to RNAfold, CONTRAFold, and RNA-FM (a sequence-only RNA language model) across three secondary structure datasets: PDB-derived structures, ArchiveII, and bpRNA-1m TS0. ATOM-1 probes are competitive with or superior to the physics-inspired thermodynamic methods and substantially outperform RNA-FM probes, demonstrating that chemical mapping pre-training encodes structural information beyond what sequence co-evolution alone provides. For tertiary structure and mRNA stability, ATOM-1 similarly achieves top-ranked performance against existing methods. Exact parameter counts and full training hyperparameters are not disclosed in the preprint.

Applications

ATOM-1 is designed primarily for therapeutic RNA development. Researchers can use the model's embeddings as input features for predicting RNA secondary and tertiary structure, in-solution stability of mRNA constructs, and the activity of siRNAs and other RNA therapeutics. The probe-based adaptation framework means that teams with relatively small labeled datasets — such as company-internal experimental screens — can develop accurate, task-specific predictors without large-scale fine-tuning infrastructure. In vaccine development contexts, accurate mRNA stability prediction directly informs sequence optimization decisions that affect immunogenicity and shelf life.

Impact

ATOM-1 represents a methodological shift in how RNA foundation models are built, establishing that experimental chemical mapping data — not just genomic sequence — can and should be used for pre-training. The benchmark results, particularly the top ranking across 1,600 methods in the mRNA stability analysis, provide concrete evidence that this data modality confers meaningful advantages. As of the preprint's release, Atomic AI positioned ATOM-1 as a platform component of their proprietary drug discovery pipeline, meaning the model weights and training data are not publicly released — a notable limitation for academic reuse. The work nonetheless sets an important precedent for the field, and it is likely to motivate broader adoption of experimental signal in RNA model pre-training, analogous to how AlphaFold 2 demonstrated the value of co-evolutionary data for protein structure prediction.

Citation

ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data

Preprint

Boyd, N., Anderson, B. M., Townshend, B., et al. (2023). ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data. bioRxiv, 2023.12.13.571579.

DOI: 10.1101/2023.12.13.571579

Overview

Key Features

Chemical mapping pre-training: ATOM-1 is trained on data from millions of RNA sequences with over one billion nucleotide-level measurements obtained through in-house chemical probing experiments (using reagents such as DMS and SHAPE), providing a direct experimental grounding that sequence-only models lack.

Probe-based adaptation: Rather than fine-tuning the full model, downstream tasks are addressed using small single-hidden-layer MLP probe networks trained on ATOM-1 embeddings, enabling state-of-the-art accuracy on multiple RNA prediction tasks with limited labeled data.

Dual representation output: For an RNA sequence of length n, ATOM-1's encoder produces two structured representations — a single representation of shape n x 512 capturing per-nucleotide features, and a pair representation of shape n x n x 256 capturing pairwise nucleotide relationships — providing rich input for structural inference tasks.

Pseudoknot-aware secondary structure prediction: Probe networks trained on ATOM-1 embeddings can predict complex secondary structure elements including pseudoknots, which thermodynamic methods such as RNAfold are fundamentally unable to handle.

Broad RNA modality coverage: The model supports prediction tasks across structurally and functionally distinct RNA classes, including mRNA, siRNA, and circular RNA, making it relevant to a wide range of therapeutic design contexts.

State-of-the-art mRNA stability prediction: In a retrospective benchmarking analysis, an ATOM-1-derived predictor outperformed all 1,600 competing methods entered in a vaccine design challenge for predicting in-solution mRNA stability.

Technical Details

Applications

Impact

Citation

ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data

Preprint

Boyd, N., Anderson, B. M., Townshend, B., et al. (2023). ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data. bioRxiv, 2023.12.13.571579.

DOI: 10.1101/2023.12.13.571579

ATOM-1

Overview

Key Features

Technical Details

Applications

Impact

Citation

ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data

Metrics

Citations

Tags

Resources

ATOM-1

Overview

Key Features

Technical Details

Applications

Impact

Citation

ATOM-1: A Foundation Model for RNA Structure and Function Built on Chemical Mapping Data

Metrics

Citations

Tags

Resources