bio.rodeo
ModelsOrganizationsLeaderboardAbout
bio.rodeo

The authoritative source for evaluating biological foundation models. No hype, just honest analysis.

AboutFAQSubmit a modelContact
© 2026 Pulsatance. All rights reserved. ~
Built by Pulsatance
Biosignals foundation models
Biosignals

OPERA

University of Cambridge

Open respiratory acoustic foundation models pretrained on ~136K curated cough and breathing recordings for health tasks such as disease detection and lung function estimation.

Released: June 2024

Respiratory sounds — coughs, breaths, and exhalations — carry clinically useful information about lung and airway health, and the proliferation of smartphones and wearables has made such audio cheap to collect at scale. Yet most respiratory audio models are trained from scratch on small, narrowly labelled datasets for a single task (for example COVID-19 screening), which limits their accuracy and generalisability. OPERA (OPEn Respiratory Acoustic foundation models) tackles this by pretraining general-purpose encoders on large volumes of unlabelled respiratory audio, producing reusable representations that can be adapted to many downstream health tasks.

OPERA was developed by Yuwei Zhang, Tong Xia, Jing Han, Cecilia Mascolo and colleagues in the Mobile Systems group at the University of Cambridge, and presented at NeurIPS 2024 (preprint June 2024). Beyond releasing models, the authors contribute an open framework: a curated pretraining corpus aggregated from public respiratory-audio sources, three pretrained foundation models, and a benchmark of 19 downstream health tasks for standardised evaluation.

The project is deliberately open — code, curated data pipelines, pretrained checkpoints, and the evaluation suite are all released — to give the respiratory health community a common, reproducible starting point rather than a collection of isolated, task-specific models.

#Key Features

  • Respiratory-specific pretraining: Encoders are pretrained on cough and breathing audio rather than general environmental or music audio, yielding representations better matched to health applications.
  • Three model variants: OPERA-CT (a contrastive transformer), OPERA-CE (a contrastive CNN/efficient encoder), and OPERA-GT (a generative transformer autoencoder), spanning contrastive and reconstructive self-supervised objectives.
  • Open benchmark of 19 tasks: A standardised evaluation suite covering disease detection (COVID-19, COPD, smoker status), lung-function estimation, and other respiratory health endpoints across multiple public datasets.
  • Strong, generalisable performance: OPERA models outperform general-audio pretrained baselines on 16 of 19 tasks and transfer to unseen datasets and new respiratory sound types.
  • Fully released artefacts: Code (MIT), curated pretraining pipeline, pretrained weights, and the benchmark are publicly available for reuse and extension.

#Technical Details

OPERA curates roughly 136,000 respiratory audio samples totalling about 440 hours of cough and breathing recordings, drawn from public sources including COVID-19 Sounds, UK COVID-19, CoughVID, ICBHI, HF Lung, Coswara, KAUH, and others. Three encoders are pretrained with self-supervised objectives: OPERA-CT and OPERA-CE use contrastive learning (transformer and efficient-CNN backbones, respectively), while OPERA-GT is a generative transformer autoencoder. The encoders operate on spectrogram inputs of fixed-length audio segments (around 8 seconds) and produce 768-dimensional feature embeddings used for downstream linear probing and fine-tuning. Across the 19-task benchmark, the OPERA models surpass general-audio foundation models (such as those pretrained on AudioSet) on 16 tasks, with contrastive and generative variants showing complementary strengths across classification and regression endpoints. Parameter counts for the individual encoders are not stated in the paper.

#Applications

OPERA targets researchers and developers building respiratory health tools from acoustic data, particularly in mobile and remote-monitoring settings where audio can be captured passively on consumer devices. The pretrained encoders can be adapted — typically via lightweight linear probing or fine-tuning — to tasks such as COVID-19 and COPD detection, smoker classification, and lung-function estimation, lowering the data and compute barrier for groups without large labelled cohorts. The accompanying benchmark also serves as a shared yardstick for comparing new respiratory-audio methods.

#Impact

OPERA is among the first openly released foundation-model efforts dedicated to respiratory acoustics, and it establishes both reusable encoders and a common benchmark for a field that had been fragmented across bespoke, single-task models. By demonstrating that domain-specific pretraining beats general-audio models on most health tasks and generalises to unseen data, it strengthens the case for specialised audio foundation models in health. A practical caveat is the licensing split: the code is released under the permissive MIT licence, but the pretrained weights on Hugging Face are CC-BY-NC-4.0, restricting their use to non-commercial purposes. Evaluation also remains observational, requiring prospective clinical validation before deployment.

Citation

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Preprint

Zhang, Y., et al. (2024) Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking. Neural Information Processing Systems.

DOI: 10.48550/arXiv.2406.16148

Recent citations

Papers that recently cited this model.

Not enough citation data yet.

Top citations

The most-cited papers that cite this model.

Not enough citation data yet.

Citations

Total Citations41
Influential9
References76

GitHub

Stars80
Forks19
Open Issues4
Contributors2
Last Push1y ago
LanguagePython
LicenseMIT

HuggingFace

Downloads0
Likes2
Last Modified1y ago

Fields of citing research

Not enough data

Openness

bio.rodeo opennessFully open · usable and reproducible
59Partial
Usability — can I run it?69
Reproducibility — can I retrain it?57
Model Openness Framework
Unclassified
Restrictive license on core components

Tags

autoencodercontrastive_learningcoughdisease_detectionfoundation_modelrepresentation_learningrespiratory_audiorespiratory_health_screeningself_supervisedtransformer

Resources

GitHub RepositoryResearch PaperHuggingFace Model