Shanghai AI Laboratory / Renmin University of China
A decoder-only multimodal foundation model that natively unifies sequences, 3D structures, and natural language for both small molecules and proteins.
BioMatrix is a multimodal biological foundation model that natively integrates 1D sequences, 3D structures, and natural language for both small molecules and proteins within a single decoder-only architecture. Most prior biological foundation models specialize in one modality (e.g., protein language models such as ESM, or molecular sequence models) and bolt on additional modalities through external encoders, projection adapters, or modality-specific output heads. BioMatrix instead casts every modality into a shared discrete token space and trains a single next-token-prediction objective over the entire "modality matrix," removing the architectural seams that typically separate sequence, structure, and text processing.
The model was developed by researchers at Shanghai AI Laboratory and the Gaoling School of Artificial Intelligence at Renmin University of China, led by first author Qizhi Pei with senior author Lijun Wu, and released as an arXiv preprint in June 2026. It is built on the Qwen3 language model backbone, offered in 1.7B and 4B parameter sizes, and continually pretrained on 304.4 billion tokens spanning general and domain-specific text, molecular and protein data in both 1D and 3D forms, and cross-modal corpora.
By unifying tokenization across modalities, BioMatrix supports cross-modal generation tasks—such as converting between a protein sequence and its structure, or between a molecule and a textual description—within one model, while reporting state-of-the-art or competitive results on 77 of 80 downstream tasks. As a fresh preprint without a peer-reviewed venue, its results should be read as author-reported.
BioMatrix uses a decoder-only transformer based on Qwen3-1.7B-Base and Qwen3-4B-Base, with the 4B-SFT checkpoint supporting an 8,192-token context length. Continual pretraining covers 304.4 billion tokens across text, molecular and protein 1D/3D data, and cross-modal pairs, followed by instruction tuning over 80 downstream tasks (generation, name conversion, property prediction, captioning, folding, and binding-affinity estimation). The instruction-tuning corpus (BioMatrix-SFT) comprises roughly 23.6 million examples drawn from sources including SMolInstruct, MoleculeQA, OpenMolIns, DPLM-2, and PDBBind. Across the 80 evaluation tasks in six categories spanning molecules, proteins, and their interactions, the authors report state-of-the-art or competitive performance on 77, without relying on modality-specific architectural components.
BioMatrix targets computational chemists, structural biologists, and drug-discovery researchers who otherwise juggle separate specialized models for molecular property prediction, protein structure prediction, and biomedical text understanding. A single model handles molecule generation and captioning, protein folding and inverse folding, property and binding-affinity prediction, and cross-modal translation, which simplifies pipelines that combine small-molecule and protein reasoning—for example, structure-aware molecule design or protein-ligand interaction analysis. The Apache-2.0 license and four open checkpoints make it accessible for both direct use and downstream fine-tuning.
BioMatrix is positioned as the first foundation model to natively span the full "modality matrix" of sequences, structures, and language across both small molecules and proteins in one decoder-only model, demonstrating that a unified token space can match or exceed adapter-based and specialized approaches on a broad task suite. If the reported breadth holds up under independent evaluation, the design points toward a simpler recipe for multimodal biological modeling—scaling one autoregressive objective rather than stitching together modality-specific encoders. As a recent preprint, its real-world adoption and reproducibility remain to be established, and the authors note limitations on complex structures and domain coverage.
Pei, Q., et al. (2026) BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language. arXiv.
DOI: 10.48550/arXiv.2606.22138Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data