bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Single-cell foundation models
Single-cell

HitAnno

Tsinghua University

Hierarchical language model for atlas-level cell-type annotation of scATAC-seq data that annotates new query datasets without retraining.

Released: March 2026

Single-cell ATAC sequencing (scATAC-seq) profiles chromatin accessibility across thousands of cells, but assigning cell types to these profiles is harder than for transcriptomic data: accessibility matrices are sparse, high-dimensional, and binary-like, and reference atlases are scarce. As scATAC-seq atlases grow, there is a need for annotation methods that scale to atlas-level data and generalize to new datasets without per-dataset retraining.

HitAnno, developed by Rui Jiang's group in the Department of Automation at Tsinghua University and posted to bioRxiv in March 2026, addresses this with a hierarchical language model. It converts each cell's accessibility profile into a "cell sentence" built from selected cell-type-specific peaks, then applies a two-level attention mechanism that captures accessibility structure hierarchically — modeling co-accessibility among peaks and dependencies across higher-order peak sets.

Trained on a human atlas spanning 31 cell types, HitAnno can directly annotate new query datasets without retraining, offering a scalable and interpretable route to atlas-level scATAC-seq annotation through an accompanying online interface.

#Key Features

  • Cell-sentence representation: Encodes each cell's chromatin accessibility as a sentence of selected cell-type-specific peaks, casting annotation as a language-modeling problem.
  • Two-level hierarchical attention: Captures both local co-accessibility among peaks and higher-order dependencies across peak sets, supporting an interpretable annotation process.
  • Zero-shot annotation: After training on a 31-cell-type human atlas, annotates new query datasets directly without retraining.
  • Robust across settings: Annotates major and rare cell types reliably in intra-dataset, cross-donor, and inter-dataset scenarios.
  • Accessible interface: Available through an online tool for applying the model to user data.

#Technical Details

HitAnno is a hierarchical transformer-based language model for scATAC-seq annotation. It first selects cell-type-specific peaks and uses them to construct per-cell "cell sentences," which serve as input tokens. A two-level attention mechanism then processes these sentences hierarchically: lower-level attention captures co-accessibility patterns among individual peaks, while higher-level attention models dependencies across aggregated peak sets, yielding interpretable signals about which accessibility features drive each prediction. The model is trained on a human atlas comprising 31 cell types and is evaluated across intra-dataset, cross-donor, and inter-dataset annotation tasks, where it robustly labels both abundant and rare cell types. Once trained, it performs zero-shot annotation on previously unseen query datasets without additional fine-tuning.

#Applications

HitAnno is intended for researchers analyzing scATAC-seq data who need scalable, reference-based cell-type annotation. It is well suited to labeling large atlas-scale datasets, transferring annotations across donors and studies, and identifying rare populations that simpler methods miss. The accompanying online interface lowers the barrier for experimental groups to annotate new accessibility datasets without building bespoke pipelines, and its interpretable attention highlights the peaks underlying each call.

#Impact

By adapting the cell-sentence, language-model framing to chromatin accessibility and adding hierarchical attention, HitAnno extends atlas-level, retraining-free annotation — already common for scRNA-seq — into the scATAC-seq setting. Its emphasis on interpretability and robustness to rare cell types addresses recurring pain points in epigenomic annotation. As a preprint released under a CC-BY-NC license, its reported performance awaits peer review and broader independent benchmarking, and access is governed by non-commercial terms.

Tags

cell_type_annotationzero_shot_annotationtransformerlanguage_modelzero_shotscatac_seqchromatin_accessibility