Prediction of normal echocardiograms from 12-lead electrocardiograms using deep learning to improve outpatient screening for structural heart disease

European Heart Journal - Digital Health

12 January 2026
Organised by: Logo
ESC Journals

Abstract

AbstractIntroduction

Echocardiography is indispensable for diagnosing structural heart disease (SHD). However, over 40% of initial outpatient imaging studies show no clinically significant findings, contributing to unnecessary utilization and capacity limitations in cardiology clinics.

Purpose

To develop and validate an artificial intelligence-based electrocardiogram interpretation (AI-ECG) model that can accurately identify patients unlikely to have clinically significant SHD, enabling more efficient use of echocardiography.

Methods

We retrospectively paired ECGs with transthoracic echocardiograms (performed within 90 days) from two Dutch hospitals (2009-2023). We built an ensemble model in two stages. First, a convolutional neural network analyzed the median beat ECG and produced nine probability scores, one for each predefined abnormality: moderate-or-greater valvular disease, left or right ventricular systolic dysfunction or dilation, and grade ≥II diastolic dysfunction. These probabilities, along with patient age and sex, were then input into an XGBoost classifier, which outputs a single probability that the echocardiogram would be abnormal (any of the nine abnormalities). The operating point was fixed at 95% sensitivity in the development set to maximize negative predictive value, and model performance was evaluated in an outpatient test cohort.

Results

The development sets comprised 80,635 ECGs from 52,006 patients (31.5% SHD) and 4,024 ECGs from 4,024 patients (20.3% SHD) were included in the outpatient test cohort. Figure 1 shows the ROC curve (AUROC 0.84, 95% CI 0.83–0.86). At the prespecified threshold, sensitivity was 0.95 (95% CI: 0.93–0.96), specificity 0.35 (95% CI: 0.34–0.37), PPV 0.27 (95% CI: 0.25–0.29) and NPV 0.96 (95% CI: 0.95–0.98). The model would have deferred 1,130 (28.1%) echocardiograms while missing 41 (1.0%) significant SHD cases, none classified as severe.

Conclusions

An AI-ECG screening model can effectively rule out clinically significant SHD with high sensitivity and NPV. In our retrospective analysis, it could potentially reduce the volume of outpatient echocardiograms by nearly 30%, improving efficiency. Prospective trials are warranted to validate these findings and to assess the safety, clinical integration workflow, and cost-effectiveness of implementing the model in practice.

Discriminative performance of AI-ECG

Contributors

B Arends
B Arends

Author

University Medical Center Utrecht Utrecht , Netherlands (The)

D Ahmetagic
D Ahmetagic

Author

University Medical Center Utrecht Utrecht , Netherlands (The)

T P Mast
T P Mast

Author

R Van Es
R Van Es

Author

ESC 365 is supported by