Challenges in the application of ECG foundation models to out-of-domain populations: an evaluation in professional athletes

European Heart Journal - Digital Health

12 January 2026
Organised by: Logo
ESC Journals

Abstract

AbstractBackground

Comprehensive cardiovascular screening is crucial to safeguarding athletes' health, and ECG screening is a key part of this process. AI support could be essential, with recent ECG foundation models providing scalable, low-input tools for rapid, cost-effective interpretation, enabling large-scale, long-term screening. These models are pretrained and fine-tuned primarily in clinical contexts; hence, it is crucial to evaluate their performance in out-of-domain populations such as athletes, particularly given the limited availability of athlete-specific data for fine-tuning. Ensuring robustness in these settings is vital to safely distinguish normal adaptations from true abnormalities.

Purpose

This study aims to evaluate a recently developed ECG foundation model for out-of-domain generalization for ECG interpretation using a dataset of athletes' ECGs.

Methods

A cohort of 43 professional athletes underwent 12-lead ECG screening. A board-certified cardiologist annotated each athlete ECG for model performance evaluation. We applied a recently published foundation ECG model, ECG-FM [2]. The model was blind to the cardiologist’s annotations and was pre-trained on 873,632 ECGs sourced from PhysioNet2021 [3] (n = 85,955) and MIMIC-IV-ECG [4] (n = 787,677), then fine-tuned on the labelled PhysioNet2021 dataset to recognize various cardiac conditions, without additional finetuning or adaption. For evaluation, per-class accuracy was calculated. Sensitivity, specificity, and area under the precision-recall curve (AUCPR) were computed for classes with at least one positive (for sensitivity) and negative (for specificity) case. Additionally, per-patient accuracy and overall Hamming loss were calculated. A permutation test (1,000 label shuffles) assessed whether the observed per-patient accuracy exceeded chance.

Results

The model showed a Hamming loss of 0.108 (a total of ~11% misclassification). Mean accuracy over patients was at 0.892 (SD: 0.059, p < 0.001) respectively (see Figure 1). Mean performance per class showed high specificity (mean: 0.97, SD: 0.09) and solid accuracy (mean: 0.89, SD: 0.18). Sensitivity (mean: 0.33, SD: 0.36) and AUCPR (mean: 0.68, SD: 0.29) varied across classes. Bundle branch block and sinus rhythm were challenging to detect, while sinus bradycardia was identified reliably (see Table 1).

Conclusion

Despite trained on clinical data, the ECG foundation model performed well on athlete ECGs without fine-tuning, achieving high specificity and accuracy and detecting many athlete heart adaptations. Lower sensitivity for some athlete-specific abnormalities highlights the need for refinement, but overall, this successful pilot paves the way for AI-supported cardio risk screening development.

Contributors

C Goelz
C Goelz

Author

University Hospital of Munich Munich , Germany

S Lang
S Lang

Author

S Brunner
S Brunner

Author

University Hospital of Munich Munich , Germany

S Vieluf
S Vieluf

Author

Ludwig Maximilians University Munich , Germany

ESC 365 is supported by