Artificial intelligence electrocardiography for left ventricular systolic dysfunction demonstrates preserved performance across demographic training imbalances
European Heart Journal - Digital Health

Abstract
Artificial intelligence (AI)-enabled electrocardiograms (AI-ECG) can detect left ventricular systolic dysfunction (LVSD), but demographic imbalance in training datasets may introduce bias. Foundational models, pretrained on large and diverse datasets, may mitigate such concerns. We aimed to assess the impact of demographic composition in training datasets on the performance of an ECG Foundational Model (ECGFM) for diagnosing LVSD.
We developed an ECG foundational model (ECGFM) using transformer architecture and self-supervised pretraining on 983 200 ECGs. Using 44 815 paired ECG–echocardiogram datasets, we trained the model under three biased scenarios: (1) sex-skewed (male-only or female-only), (2) race-skewed (White-only or non-White), and (3) balanced. Models were evaluated on a test cohort consisting of 4663 male patients (52%) and 4300 female patients (48%) for the sex configuration and 4440 (49.5%) White, 558 (6.2%) Black, 925 Asian (10.3%), and 3040 other (33.9%) patients, for the race-based configuration using area under the receiver operating characteristic curve (AUROC). The ECGFM demonstrated consistent performance across all demographic configurations. Training on male-only or female-only cohorts yielded comparable AUROC scores of 0.85–0.90 for both sexes in the test set in predicting LVSD. Similarly, training on White-only or non-White cohorts resulted in robust AUROC scores (≥0.90) across all racial groups, including Asian, Black, Hispanic/Latino, and American Indian/Native Alaskan subgroups. Balanced and imbalanced training produced comparable accuracy, sensitivity, and specificity. The performance of the model was externally tested in EchoNext, revealing AUROC scores 0.823–0.917 for sex and 0.822–0.917 for race.
Our transformer-based ECG foundational model pretrained using self-supervised learning demonstrated preserved diagnostic accuracy for LVSD across diverse demographic groups, even when trained on demographically imbalanced datasets.
Contributors

P Nelson Hsieh
Author

Parth Agrawal
Author

Aman Alok
Author

Sathis Kumar
Author

Charu Ramanathan
Author

Venkatesh L Murthy
Author

Niraj Varma
Author

Venkat Nagarajan
Author

Andrew P Ambrosy
Author

Mattheus Ramsis
Author

Antonis A Armoundas
Author

Jagmeet P Singh
Author
You may be interested in


