Mismatch between lead I from the 12-Lead ECG and smartwatch ECG: can AI models adapt for left ventricular systolic dysfunction prediction?

European Heart Journal

5 November 2025
Organised by: Logo
ESC Journals

Abstract

AbstractBackground/Introduction

Most artificial intelligence-enabled electrocardiography (AI-ECG) research on single-lead ECG has primarily utilized Lead I data from the standard 12-lead ECG (S-Lead I) for model training. However, it remains uncertain whether the results validated using only S-Lead I data in conventional studies can be directly applied to wearable devices such as smartwatches. S-Lead I and the Lead I that acquired via a smartwatch (W-Lead I) differ substantially in sensor characteristics, noise patterns, and measurement methods. These discrepancies in data properties suggest that a model trained solely on S-Lead I may experience a decline in performance when applied directly to W-Lead I. Despite these concerns, there is limited evidence quantifying this performance gap.

Purpose

The primary goal of this study was to evaluate the generalizability of an AI-enabled left ventricular systolic dysfunction (LVSD) prediction model, trained on S-Lead I, when applied to W-Lead I. We examined the correlation between S-Lead I and W-Lead I performance throughout model training to determine whether improvements on S-Lead I were consistently reflected in W-Lead I, ensuring the model's generalizability across different ECG acquisition methods.

Method

We retrospectively collected 151,065 S-Lead I ECGs from 151,137 patients who underwent echocardiography within 14 days at Hospital A between 2017 and 2024, forming an ECG cohort. Additionally, we collected 2,441 W-Lead I data recorded within 24 hours of S-Lead I. The paired S-Lead I and W-Lead I data were used for model evaluation, while the S-Lead I data alone were used for model training. Predictive performance was assessed using the mean of the area under the receiver operating characteristic curve (AUROC) and the area under the precision–recall curve (AUPRC). In addition, correlation values were calculated across every training iteration to quantify how well performance trends on S-Lead I translated to W-Lead I.

Results

As shown in Figure 1.B , both the training dataset and the S-Lead I test set demonstrated a steady improvement in AUROC and AUPRC throughout the iterative training process, maintaining a strong performance correlation of 0.89 or higher. In contrast, the Apple Watch and Galaxy W-Lead I test sets displayed more pronounced fluctuations in performance and did not follow the same upward trend, with notably lower correlation values (Apple Watch: 0.54; Galaxy Watch: 0.69; see Figure 1.C). These results indicate that a model trained exclusively on S- Lead I may not sufficiently capture the sensor-specific and physiological nuances of W-Lead I.

Conclusion

The results indicate that an LVSD prediction model trained exclusively on S-Lead I has limited generalizability when applied to real-world W-Lead I data. Future efforts should focus on developing algorithms that incorporate the watch signal characteristics of smartwatch ECG, thereby enhancing clinical utility.

Contributors

J Jang
J Jang

Author

H S Lee
H S Lee

Author

Sejong general hospital Seoul , Korea (Republic of)

ESC 365 is supported by