OPERA

Respiratory acoustic foundation models pretrained on roughly 136K cough and breathing recordings for disease detection and lung function estimation.

Released: June 2024

Respiratory sounds — coughs, breaths, and exhalations — carry clinically useful information about lung and airway health, and the proliferation of smartphones and wearables has made such audio cheap to collect at scale. Yet most respiratory audio models are trained from scratch on small, narrowly labelled datasets for a single task (for example COVID-19 screening), which limits their accuracy and generalisability. OPERA (OPEn Respiratory Acoustic foundation models) tackles this by pretraining general-purpose encoders on large volumes of unlabelled respiratory audio, producing reusable representations that can be adapted to many downstream health tasks.

OPERA was developed by Yuwei Zhang, Tong Xia, Jing Han, Cecilia Mascolo and colleagues in the Mobile Systems group at the University of Cambridge, and presented at NeurIPS 2024 (preprint June 2024). Beyond releasing models, the authors contribute an open framework: a curated pretraining corpus aggregated from public respiratory-audio sources, three pretrained foundation models, and a benchmark of 19 downstream health tasks for standardised evaluation.

The project is deliberately open — code, curated data pipelines, pretrained checkpoints, and the evaluation suite are all released — to give the respiratory health community a common, reproducible starting point rather than a collection of isolated, task-specific models.

Key Features

Respiratory-specific pretraining: Encoders are pretrained on cough and breathing audio rather than general environmental or music audio, yielding representations better matched to health applications.
Three model variants: OPERA-CT (a contrastive transformer), OPERA-CE (a contrastive CNN/efficient encoder), and OPERA-GT (a generative transformer autoencoder), spanning contrastive and reconstructive self-supervised objectives.
Open benchmark of 19 tasks: A standardised evaluation suite covering disease detection (COVID-19, COPD, smoker status), lung-function estimation, and other respiratory health endpoints across multiple public datasets.
Strong, generalisable performance: OPERA models outperform general-audio pretrained baselines on 16 of 19 tasks and transfer to unseen datasets and new respiratory sound types.
Fully released artefacts: Code (MIT), curated pretraining pipeline, pretrained weights, and the benchmark are publicly available for reuse and extension.

Technical Details

OPERA curates roughly 136,000 respiratory audio samples totalling about 440 hours of cough and breathing recordings, drawn from public sources including COVID-19 Sounds, UK COVID-19, CoughVID, ICBHI, HF Lung, Coswara, KAUH, and others. Three encoders are pretrained with self-supervised objectives: OPERA-CT and OPERA-CE use contrastive learning (transformer and efficient-CNN backbones, respectively), while OPERA-GT is a generative transformer autoencoder. The encoders operate on spectrogram inputs of fixed-length audio segments (around 8 seconds) and produce 768-dimensional feature embeddings used for downstream linear probing and fine-tuning. Across the 19-task benchmark, the OPERA models surpass general-audio foundation models (such as those pretrained on AudioSet) on 16 tasks, with contrastive and generative variants showing complementary strengths across classification and regression endpoints. Parameter counts for the individual encoders are not stated in the paper.

Applications

OPERA targets researchers and developers building respiratory health tools from acoustic data, particularly in mobile and remote-monitoring settings where audio can be captured passively on consumer devices. The pretrained encoders can be adapted — typically via lightweight linear probing or fine-tuning — to tasks such as COVID-19 and COPD detection, smoker classification, and lung-function estimation, lowering the data and compute barrier for groups without large labelled cohorts. The accompanying benchmark also serves as a shared yardstick for comparing new respiratory-audio methods.

Impact

OPERA is among the first openly released foundation-model efforts dedicated to respiratory acoustics, and it establishes both reusable encoders and a common benchmark for a field that had been fragmented across bespoke, single-task models. By demonstrating that domain-specific pretraining beats general-audio models on most health tasks and generalises to unseen data, it strengthens the case for specialised audio foundation models in health. A practical caveat is the licensing split: the code is released under the permissive MIT licence, but the pretrained weights on Hugging Face are CC-BY-NC-4.0, restricting their use to non-commercial purposes. Evaluation also remains observational, requiring prospective clinical validation before deployment.

Citation

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Preprint

Zhang, Y., et al. (2024) Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking. Neural Information Processing Systems.

DOI: 10.48550/arXiv.2406.16148

Recent citations

Papers that recently cited this model.

CaReCoS: A Spectrogram based Visual Benchmark for Cardiac, Respiratory and Cough Sounds
Harshit Rajgarhia, Shuubham Ojha, Akhil Pothanapalli, et al.
Jul 2026
0
BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions
M. Sanap, P. Desikan, Edgar J. Lobaton
Jun 2026
0Influential
CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification
M. Moldovan, A. Batliner, Thomas M. Berghaus, et al.
Jun 2026
0Influential

Top citations

The most-cited papers that cite this model.

RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction
Yuwei Zhang, Tong Xia, Aaqib Saeed, et al.
ML4H@NeurIPS · Oct 2024
19Influential
Oxidative Stress and Inflammation in Hypoxemic Respiratory Diseases and Their Comorbidities: Molecular Insights and Diagnostic Advances in Chronic Obstructive Pulmonary Disease and Sleep Apnea
J. Rodríguez-Pérez, R. Andreu-Martínez, Roberto Daza, et al.
Antioxidants · Jul 2025
15
ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems
Bufang Yang, Lilin Xu, Liekang Zeng, et al.
arXiv.org · 2025
15
CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, et al.
ACM Conference on Health, Inference, and Learning · May 2025
7
Shifting the Paradigm: A Diffeomorphism Between Time Series Data Manifolds for Achieving Shift-Invariancy in Deep Learning
B. U. Demirel, Christian Holz
International Conference on Learning Representations · Feb 2025
6

Citations

Total Citations46

Influential12

References76

GitHub

Stars83

Forks19

Open Issues4

Contributors2

Last Push1y ago

LanguagePython

LicenseMIT

HuggingFace

Downloads0

Likes2

Last Modified1y ago

Fields of citing research

Computer Science96%
Medicine84%
Engineering64%
Mathematics2%
Environmental Science2%

Share of papers citing this model.

Openness

bio.rodeo opennessFully open · usable and reproducible

59Partial

Usability — can I run it?69

Reproducibility — can I retrain it?57

Model Openness Framework

Unclassified

Restrictive license on core components

Resources

GitHub Repository Research Paper HuggingFace Model

Key Features

Respiratory-specific pretraining: Encoders are pretrained on cough and breathing audio rather than general environmental or music audio, yielding representations better matched to health applications.

Three model variants: OPERA-CT (a contrastive transformer), OPERA-CE (a contrastive CNN/efficient encoder), and OPERA-GT (a generative transformer autoencoder), spanning contrastive and reconstructive self-supervised objectives.

Open benchmark of 19 tasks: A standardised evaluation suite covering disease detection (COVID-19, COPD, smoker status), lung-function estimation, and other respiratory health endpoints across multiple public datasets.

Strong, generalisable performance: OPERA models outperform general-audio pretrained baselines on 16 of 19 tasks and transfer to unseen datasets and new respiratory sound types.

Fully released artefacts: Code (MIT), curated pretraining pipeline, pretrained weights, and the benchmark are publicly available for reuse and extension.

Technical Details

Applications

Impact

Recent citations

Papers that recently cited this model.

CaReCoS: A Spectrogram based Visual Benchmark for Cardiac, Respiratory and Cough Sounds

Harshit Rajgarhia, Shuubham Ojha, Akhil Pothanapalli, et al.

Jul 2026

BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions

M. Sanap, P. Desikan, Edgar J. Lobaton

Jun 2026

0Influential

CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification

M. Moldovan, A. Batliner, Thomas M. Berghaus, et al.

Jun 2026

0Influential

OPERA

#Key Features

#Technical Details

#Applications

#Impact

Citation

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Recent citations

CaReCoS: A Spectrogram based Visual Benchmark for Cardiac, Respiratory and Cough Sounds

BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions

CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

OPERA

#Key Features

#Technical Details

#Applications

#Impact

Citation

Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking

Recent citations

CaReCoS: A Spectrogram based Visual Benchmark for Cardiac, Respiratory and Cough Sounds

BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions

CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification

Top citations

Related models

Citations

GitHub

HuggingFace

Fields of citing research

Openness

Tags

Resources

Key Features

Technical Details

Applications

Impact

Key Features

Technical Details

Applications

Impact