bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
DNA & Gene foundation models
DNA & GeneLanguage model

BacteReason

University of Tokyo

A reasoning LLM fine-tuned on clinical antimicrobial-susceptibility data augmented with mechanistic rationales, predicting susceptibility with explanations for novel isolate-antibiotic pairs.

Released: June 2026

BacteReason is a reasoning-oriented large language model (LLM) for predicting the antimicrobial susceptibility of clinical bacterial isolates while supplying mechanistic explanations for each prediction. It addresses a persistent gap in computational antimicrobial-resistance (AMR) work: most predictors output a label (susceptible or resistant) without a transparent rationale, which limits clinical trust and makes failures hard to diagnose. By coupling susceptibility prediction with human-readable mechanistic reasoning, BacteReason aims to make AMR predictions both more accurate and more interpretable.

The model was developed by Koji Tsuda's group at the University of Tokyo (Oikawa, Kawashima, Kinjo, Demizu, and Tamura) and released as a bioRxiv preprint on June 7, 2026. Its central idea is to fine-tune an open-weight base LLM not just on susceptibility outcomes but on those outcomes paired with mechanistic rationales explaining why a given organism is or is not susceptible to a given antibiotic. The rationales are generated automatically by a teacher LLM that queries biomedical knowledge graphs through TogoMCP, a Model Context Protocol interface to biomedical data resources.

BacteReason sits at the intersection of clinical microbiology, AMR genomics, and LLM reasoning. It is distinct from BacPT, a separate bacterial model, and represents a knowledge-distillation approach in which structured biomedical knowledge is injected into an LLM via teacher-generated reasoning traces rather than through architectural changes.

#Key Features

  • Mechanistic reasoning with predictions: Rather than emitting a bare susceptible/resistant label, the model produces a mechanistic explanation alongside each prediction, improving interpretability for clinical and research use.
  • Knowledge-graph-grounded rationales: Training rationales are generated by a teacher LLM that accesses biomedical knowledge graphs through TogoMCP, grounding the reasoning in curated biomedical knowledge rather than the base model's parametric memory alone.
  • Fine-tuned checkpoint, not retrieval at inference: Once fine-tuned, a fixed checkpoint is queried directly with new isolate-and-antibiotic combinations, so predictions do not require live knowledge-graph access at inference time.
  • Strong extrapolation gains: On an extrapolation benchmark covering isolate-antibiotic combinations not seen during training, BacteReason achieves a 43% improvement over the untuned base-model baseline.
  • Built on open-weight foundations: The approach fine-tunes an open-weight base LLM, making the methodology reproducible and adaptable to other biomedical reasoning tasks.

#Technical Details

BacteReason is produced by supervised fine-tuning of an open-weight base LLM (the specific base model is not stated in the preprint abstract). Training data consists of clinical bacterial antimicrobial-susceptibility records augmented with mechanistic rationales. These rationales are synthesized by a teacher LLM interfaced with biomedical knowledge graphs through TogoMCP, a Model Context Protocol server exposing biomedical resources; this is a knowledge-distillation setup in which the teacher's knowledge-grounded reasoning becomes training signal for the student model. After fine-tuning, the resulting checkpoint is fixed and queried with new isolate-and-antibiotic pairs to predict susceptibility together with an explanation. On an extrapolation benchmark designed to test generalization to unseen organism-drug combinations, the fine-tuned model reports a 43% improvement over the untuned baseline, indicating that the reasoning-augmented fine-tuning meaningfully improves out-of-distribution performance rather than merely memorizing training pairs.

#Applications

BacteReason targets clinical microbiology and antimicrobial stewardship, where predicting whether a bacterial isolate will respond to a given antibiotic is a routine but consequential decision. The mechanistic explanations accompanying each prediction make the model useful as a decision-support aid for clinicians and microbiologists who need to understand the basis of a prediction, and as a hypothesis-generation tool for AMR researchers studying resistance mechanisms. The teacher-distillation methodology is also broadly applicable: the same pattern of grounding LLM reasoning in biomedical knowledge graphs via TogoMCP could be extended to other prediction tasks in clinical genomics and microbiology.

#Impact

By demonstrating that knowledge-graph-grounded teacher rationales can be distilled into an open-weight LLM to deliver both accuracy gains and interpretability, BacteReason offers a template for building trustworthy biomedical reasoning models. The reported 43% extrapolation improvement over an untuned baseline suggests reasoning-augmented fine-tuning is a promising direction for AMR prediction, where generalization to novel isolate-antibiotic combinations is essential. As a recent preprint without a publicly released code repository or model weights at the time of writing, its real-world adoption and independent validation remain to be established, and the unspecified base model limits direct reproducibility for now.

Citation

BacteReason: A Reasoning Model for Antimicrobial Resistance Prediction

Oikawa, Y., et al. (2026) BacteReason: A Reasoning Model for Antimicrobial Resistance Prediction. openRxiv.

DOI: 10.64898/2026.06.04.730229

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Fields of citing research

Not enough data

Openness

bio.rodeo opennessClosed · low usability and reproducibility
20Closed
Usability — can I run it?20
Reproducibility — can I retrain it?3
Model Openness Framework
Unclassified
Missing required components

Tags

antimicrobial_resistanceantimicrobial_resistance_predictionbacteriadrug_susceptibility_predictionfine_tunedknowledge_distillationlanguage_modeltransformer

Resources

Research Paper