BioGPT

Microsoft Research Asia / Microsoft Research

Generative transformer pretrained on PubMed abstracts for biomedical text generation and mining, including relation extraction and question answering.

Released: October 2022

Parameters: 347 Million

BioGPT is a domain-specific generative language model for biomedical text, developed by Microsoft Research and published in Briefings in Bioinformatics in late 2022. While earlier biomedical language models such as BioBERT and PubMedBERT adapted the encoder-only BERT architecture and excelled at discriminative tasks like classification and named-entity recognition, they were poorly suited to natural-language generation. BioGPT closes that gap by bringing a decoder-only, GPT-style generative model into the biomedical domain.

The model is pretrained from scratch on a large corpus of PubMed abstracts, learning the vocabulary, phrasing, and factual associations characteristic of the biomedical literature. This generative formulation lets BioGPT handle a wide range of tasks—relation extraction, question answering, document classification, and free-form text generation—within a single autoregressive framework, typically by casting structured outputs as generated text under task-specific prompts.

Developed by Renqian Luo, Yingce Xia, Tao Qin, Tie-Yan Liu and collaborators, BioGPT was among the first generative pretrained transformers tailored specifically to biomedicine, and it helped establish generative modeling as a viable approach for biomedical text mining alongside the then-dominant encoder-based models.

Key Features

Generative biomedical pretraining: Unlike encoder-only models such as BioBERT, BioGPT is a decoder-only autoregressive transformer, enabling it to generate fluent, domain-appropriate biomedical text rather than only encoding it.
Prompt- and template-based task formulation: Structured tasks like relation extraction are reframed as text generation using target sequences and prompts, allowing one model to address many downstream tasks.
Strong relation-extraction performance: BioGPT set state-of-the-art F1 scores on end-to-end biomedical relation-extraction benchmarks including BC5CDR, KD-DTI, and DDI at the time of release.
State-of-the-art biomedical QA: It achieved 78.2% accuracy on PubMedQA, a new best result for the benchmark when published.
Open weights and permissive license: Pretrained checkpoints are released on HuggingFace under the MIT license, with training and fine-tuning code on GitHub.

Technical Details

BioGPT adopts the GPT-2 architecture: the base model is a 24-layer Transformer decoder with 1024-dimensional hidden states and 16 attention heads, totaling roughly 347 million parameters, and a Byte-Pair Encoding vocabulary learned on the in-domain corpus. It is pretrained with a standard autoregressive language-modeling objective on approximately 15 million PubMed abstracts (titles and abstracts) collected up to 2021. A larger variant, BioGPT-Large, scales up to the GPT-2 XL configuration (~1.5B parameters). On downstream evaluations, BioGPT reported F1 scores of 44.98% on BC5CDR, 38.42% on KD-DTI, and 40.76% on DDI for relation extraction, 78.2% accuracy on PubMedQA, and competitive results on the HoC document-classification task—consistently matching or exceeding prior biomedical language models across these benchmarks.

Applications

BioGPT supports biomedical researchers and NLP practitioners who need to extract structured knowledge from, or generate text grounded in, the published literature. Typical uses include mining drug–target and drug–drug interactions, chemical–disease relations, answering research-style biomedical questions, classifying abstracts by topic, and producing fluent descriptions of biomedical entities. Because the weights are openly available under a permissive license, it serves both as a ready-to-use model and as a strong initialization for fine-tuning on specialized literature-mining pipelines, clinical-adjacent text tasks, and knowledge-base construction.

Impact

BioGPT was an influential early demonstration that generative pretrained transformers could match or surpass encoder-based models on core biomedical NLP benchmarks, helping shift the field toward generative and prompt-based formulations that later became standard with the rise of large language models. Its open release on HuggingFace made it widely adopted as a baseline and starting point for biomedical text-mining research. As a relatively small model trained on abstracts rather than full texts, it has clear limitations—including susceptibility to factual hallucination and weaker performance than much larger general-purpose LLMs on open-ended generation—but it remains a well-cited reference point and a practical, lightweight option for domain-specific literature tasks.

Citation

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

Luo, R., et al. (2022) BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. Briefings Bioinform..

DOI: 10.1093/bib/bbac409

Recent citations

Papers that recently cited this model.

HiMANER: A hierarchical multi-scale adaptive model for named entity recognition and assertion status detection in clinical texts
Vijaya Madhavi Lakshmi. Challa, Ramakrishnudu Tene
Biomedical Signal Processing and Control · Oct 2026
0
Hierarchical multi-type annotation fusion with uncertainty-aware cross-attention for chest X-ray classification
S. Thota, Fayadh S. Alenezi, Kemal Polat, et al.
Applied Soft Computing · Oct 2026
0
MKDS: Multi-source knowledge-driven data synthesis framework for effective domain adaptation of large language models
Qihuang Zhong, Jinzhao Gong, K. Zhu, et al.
Knowledge-Based Systems · Sep 2026
0

Top citations

The most-cited papers that cite this model.

Large language models encode clinical knowledge
K. Singhal, Shekoofeh Azizi, T. Tu, et al.
Nature · Dec 2022
4.4K
ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope
P. Ray
Internet of Things and Cyber-Physical Systems · Apr 2023
2.1K
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li, Cliff Wong, Sheng Zhang, et al.
Neural Information Processing Systems · Jun 2023
1.8K
BloombergGPT: A Large Language Model for Finance
Shijie Wu, Ozan Irsoy, Steven Lu, et al.
arXiv.org · Mar 2023
1.3KInfluential
Autonomous chemical research with large language models
Daniil A. Boiko, R. MacKnight, Benjamin C Kline, et al.
Nature · Dec 2023
972

Citations

Total Citations1.5K

Influential95

References59

GitHub

Stars4.5K

Forks481

Open Issues75

Contributors9

Last Push2y ago

LanguagePython

LicenseMIT

HuggingFace

Downloads103.4K

Likes307

Last Modified3y ago

Pipelinetext-generation

Fields of citing research

Computer Science32%
Medicine25%
Biology4%
Engineering2%
Linguistics2%
Chemistry1%
Environmental Science1%
Education1%

Share of papers citing this model.

Openness

bio.rodeo opennessOpen weights · open weights, closed recipe

66Partial

Usability — can I run it?87

Reproducibility — can I retrain it?44

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Generative biomedical pretraining: Unlike encoder-only models such as BioBERT, BioGPT is a decoder-only autoregressive transformer, enabling it to generate fluent, domain-appropriate biomedical text rather than only encoding it.

Prompt- and template-based task formulation: Structured tasks like relation extraction are reframed as text generation using target sequences and prompts, allowing one model to address many downstream tasks.

Strong relation-extraction performance: BioGPT set state-of-the-art F1 scores on end-to-end biomedical relation-extraction benchmarks including BC5CDR, KD-DTI, and DDI at the time of release.

State-of-the-art biomedical QA: It achieved 78.2% accuracy on PubMedQA, a new best result for the benchmark when published.

Open weights and permissive license: Pretrained checkpoints are released on HuggingFace under the MIT license, with training and fine-tuning code on GitHub.

Technical Details

Applications

Impact

Top citations

The most-cited papers that cite this model.

BioGPT

#Key Features

#Technical Details

#Applications

#Impact

Citation

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

Recent citations

Top citations

BloombergGPT: A Large Language Model for Finance

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

BioGPT

#Key Features

#Technical Details

#Applications

#Impact

Citation

BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining

Recent citations

Top citations

BloombergGPT: A Large Language Model for Finance

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact