bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Language model foundation models
Language model

BioGPT

Microsoft Research Asia / Microsoft Research

A GPT-2-based generative transformer pretrained on 15M PubMed abstracts for biomedical text generation and mining, including relation extraction and question answering.

Released: October 2022
Parameters: 347 Million

BioGPT is a domain-specific generative language model for biomedical text, developed by Microsoft Research and published in Briefings in Bioinformatics in late 2022. While earlier biomedical language models such as BioBERT and PubMedBERT adapted the encoder-only BERT architecture and excelled at discriminative tasks like classification and named-entity recognition, they were poorly suited to natural-language generation. BioGPT closes that gap by bringing a decoder-only, GPT-style generative model into the biomedical domain.

The model is pretrained from scratch on a large corpus of PubMed abstracts, learning the vocabulary, phrasing, and factual associations characteristic of the biomedical literature. This generative formulation lets BioGPT handle a wide range of tasks—relation extraction, question answering, document classification, and free-form text generation—within a single autoregressive framework, typically by casting structured outputs as generated text under task-specific prompts.

Developed by Renqian Luo, Yingce Xia, Tao Qin, Tie-Yan Liu and collaborators, BioGPT was among the first generative pretrained transformers tailored specifically to biomedicine, and it helped establish generative modeling as a viable approach for biomedical text mining alongside the then-dominant encoder-based models.

#Key Features

  • Generative biomedical pretraining: Unlike encoder-only models such as BioBERT, BioGPT is a decoder-only autoregressive transformer, enabling it to generate fluent, domain-appropriate biomedical text rather than only encoding it.
  • Prompt- and template-based task formulation: Structured tasks like relation extraction are reframed as text generation using target sequences and prompts, allowing one model to address many downstream tasks.
  • Strong relation-extraction performance: BioGPT set state-of-the-art F1 scores on end-to-end biomedical relation-extraction benchmarks including BC5CDR, KD-DTI, and DDI at the time of release.
  • State-of-the-art biomedical QA: It achieved 78.2% accuracy on PubMedQA, a new best result for the benchmark when published.
  • Open weights and permissive license: Pretrained checkpoints are released on HuggingFace under the MIT license, with training and fine-tuning code on GitHub.

#Technical Details

BioGPT adopts the GPT-2 architecture: the base model is a 24-layer Transformer decoder with 1024-dimensional hidden states and 16 attention heads, totaling roughly 347 million parameters, and a Byte-Pair Encoding vocabulary learned on the in-domain corpus. It is pretrained with a standard autoregressive language-modeling objective on approximately 15 million PubMed abstracts (titles and abstracts) collected up to 2021. A larger variant, BioGPT-Large, scales up to the GPT-2 XL configuration (~1.5B parameters). On downstream evaluations, BioGPT reported F1 scores of 44.98% on BC5CDR, 38.42% on KD-DTI, and 40.76% on DDI for relation extraction, 78.2% accuracy on PubMedQA, and competitive results on the HoC document-classification task—consistently matching or exceeding prior biomedical language models across these benchmarks.

#Applications

BioGPT supports biomedical researchers and NLP practitioners who need to extract structured knowledge from, or generate text grounded in, the published literature. Typical uses include mining drug–target and drug–drug interactions, chemical–disease relations, answering research-style biomedical questions, classifying abstracts by topic, and producing fluent descriptions of biomedical entities. Because the weights are openly available under a permissive license, it serves both as a ready-to-use model and as a strong initialization for fine-tuning on specialized literature-mining pipelines, clinical-adjacent text tasks, and knowledge-base construction.

#Impact

BioGPT was an influential early demonstration that generative pretrained transformers could match or surpass encoder-based models on core biomedical NLP benchmarks, helping shift the field toward generative and prompt-based formulations that later became standard with the rise of large language models. Its open release on HuggingFace made it widely adopted as a baseline and starting point for biomedical text-mining research. As a relatively small model trained on abstracts rather than full texts, it has clear limitations—including susceptibility to factual hallucination and weaker performance than much larger general-purpose LLMs on open-ended generation—but it remains a well-cited reference point and a practical, lightweight option for domain-specific literature tasks.

Citation

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

Luo, R., et al. (2022) BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. Briefings Bioinform..

DOI: 10.1093/bib/bbac409

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations1.4K
Influential94
References59

GitHub

Stars4.5K
Forks482
Open Issues74
Contributors9
Last Push1y ago
LanguagePython
LicenseMIT

HuggingFace

Downloads137.9K
Likes305
Last Modified3y ago
Pipelinetext-generation

Fields of citing research

Not enough data

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe
66Partial
Usability — can I run it?87
Reproducibility — can I retrain it?44
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

biomedical_literaturegenerativegptlanguage_modelquestion_answeringrelation_extractionself_supervisedtext_generationtransformer

Resources

GitHub RepositoryResearch PaperHuggingFace Model