Artificial intelligence electrocardiography for left ventricular systolic dysfunction demonstrates preserved performance across demographic training imbalances

European Heart Journal - Digital Health

28 May 2026
Organised by: Logo
ESC Journals HEART FAILURE Chronic Heart Failure

Abstract

AbstractAims

Artificial intelligence (AI)-enabled electrocardiograms (AI-ECG) can detect left ventricular systolic dysfunction (LVSD), but demographic imbalance in training datasets may introduce bias. Foundational models, pretrained on large and diverse datasets, may mitigate such concerns. We aimed to assess the impact of demographic composition in training datasets on the performance of an ECG Foundational Model (ECGFM) for diagnosing LVSD.

Methods and results

We developed an ECG foundational model (ECGFM) using transformer architecture and self-supervised pretraining on 983 200 ECGs. Using 44 815 paired ECG–echocardiogram datasets, we trained the model under three biased scenarios: (1) sex-skewed (male-only or female-only), (2) race-skewed (White-only or non-White), and (3) balanced. Models were evaluated on a test cohort consisting of 4663 male patients (52%) and 4300 female patients (48%) for the sex configuration and 4440 (49.5%) White, 558 (6.2%) Black, 925 Asian (10.3%), and 3040 other (33.9%) patients, for the race-based configuration using area under the receiver operating characteristic curve (AUROC). The ECGFM demonstrated consistent performance across all demographic configurations. Training on male-only or female-only cohorts yielded comparable AUROC scores of 0.85–0.90 for both sexes in the test set in predicting LVSD. Similarly, training on White-only or non-White cohorts resulted in robust AUROC scores (≥0.90) across all racial groups, including Asian, Black, Hispanic/Latino, and American Indian/Native Alaskan subgroups. Balanced and imbalanced training produced comparable accuracy, sensitivity, and specificity. The performance of the model was externally tested in EchoNext, revealing AUROC scores 0.823–0.917 for sex and 0.822–0.917 for race.

Conclusion

Our transformer-based ECG foundational model pretrained using self-supervised learning demonstrated preserved diagnostic accuracy for LVSD across diverse demographic groups, even when trained on demographically imbalanced datasets.