Microsoft Research Asia / Microsoft Research
A GPT-2-based generative transformer pretrained on 15M PubMed abstracts for biomedical text generation and mining, including relation extraction and question answering.
BioGPT is a domain-specific generative language model for biomedical text, developed by Microsoft Research and published in Briefings in Bioinformatics in late 2022. While earlier biomedical language models such as BioBERT and PubMedBERT adapted the encoder-only BERT architecture and excelled at discriminative tasks like classification and named-entity recognition, they were poorly suited to natural-language generation. BioGPT closes that gap by bringing a decoder-only, GPT-style generative model into the biomedical domain.
The model is pretrained from scratch on a large corpus of PubMed abstracts, learning the vocabulary, phrasing, and factual associations characteristic of the biomedical literature. This generative formulation lets BioGPT handle a wide range of tasks—relation extraction, question answering, document classification, and free-form text generation—within a single autoregressive framework, typically by casting structured outputs as generated text under task-specific prompts.
Developed by Renqian Luo, Yingce Xia, Tao Qin, Tie-Yan Liu and collaborators, BioGPT was among the first generative pretrained transformers tailored specifically to biomedicine, and it helped establish generative modeling as a viable approach for biomedical text mining alongside the then-dominant encoder-based models.
BioGPT adopts the GPT-2 architecture: the base model is a 24-layer Transformer decoder with 1024-dimensional hidden states and 16 attention heads, totaling roughly 347 million parameters, and a Byte-Pair Encoding vocabulary learned on the in-domain corpus. It is pretrained with a standard autoregressive language-modeling objective on approximately 15 million PubMed abstracts (titles and abstracts) collected up to 2021. A larger variant, BioGPT-Large, scales up to the GPT-2 XL configuration (~1.5B parameters). On downstream evaluations, BioGPT reported F1 scores of 44.98% on BC5CDR, 38.42% on KD-DTI, and 40.76% on DDI for relation extraction, 78.2% accuracy on PubMedQA, and competitive results on the HoC document-classification task—consistently matching or exceeding prior biomedical language models across these benchmarks.
BioGPT supports biomedical researchers and NLP practitioners who need to extract structured knowledge from, or generate text grounded in, the published literature. Typical uses include mining drug–target and drug–drug interactions, chemical–disease relations, answering research-style biomedical questions, classifying abstracts by topic, and producing fluent descriptions of biomedical entities. Because the weights are openly available under a permissive license, it serves both as a ready-to-use model and as a strong initialization for fine-tuning on specialized literature-mining pipelines, clinical-adjacent text tasks, and knowledge-base construction.
BioGPT was an influential early demonstration that generative pretrained transformers could match or surpass encoder-based models on core biomedical NLP benchmarks, helping shift the field toward generative and prompt-based formulations that later became standard with the rise of large language models. Its open release on HuggingFace made it widely adopted as a baseline and starting point for biomedical text-mining research. As a relatively small model trained on abstracts rather than full texts, it has clear limitations—including susceptibility to factual hallucination and weaker performance than much larger general-purpose LLMs on open-ended generation—but it remains a well-cited reference point and a practical, lightweight option for domain-specific literature tasks.
Luo, R., et al. (2022) BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. Briefings Bioinform..
DOI: 10.1093/bib/bbac409Papers that recently cited this model.
The most-cited papers that cite this model.
Not enough data