Multimodal deep learning for acute myocardial infarction detection from 12-lead electrocardiogram: a multi-centre study with cross-hospital validation

European Heart Journal - Digital Health

27 October 2025
Organised by: Logo
ESC Journals CORONARY ARTERY DISEASE, ACUTE CORONARY SYNDROMES, ACUTE CARDIAC CARE Acute Cardiac Care Acute Coronary Syndromes VALVULAR, MYOCARDIAL, PERICARDIAL, PULMONARY, CONGENITAL HEART DISEASE Myocardial Disease

Abstract

AbstractAims

Acute myocardial infarction (AMI) remains a leading global cause of mortality, where timely diagnosis is critical to enable early intervention. The 12-lead electrocardiogram (ECG) is a critical tool for AMI detection. While deep learning (DL) models show promise for automated ECG analysis, most prior studies rely on small, curated datasets with limited external validation, limiting their clinical applicability.

Methods and results

We developed a multimodal DL model (Conv-BiLSTM-Attn) integrating convolutional and recurrent neural networks with an attention mechanism. Using a large, multi-centre dataset of 145 656 ECGs from 96 813 patients across three Swedish hospitals. We trained the model to detect AMI using raw 12-lead ECG signals and demographic inputs (age, sex). Model performance was evaluated under two external validation protocols: generalization across hospitals (GAH) and leave-one-hospital-out (LOHO). The model achieved an area under the receiver operating characteristic (AUROC) of 0.848 (95% CI: 0.84–0.86) and an area under the precision-recall (AUPRC) of 0.456, reflecting class imbalance (∼6–10% AMI prevalence) under the GAH protocol. Subgroup AUROCs ranged from 0.79 to 0.92 across age and sex groups. At the Youden-optimized threshold (0.439), the model showed sensitivity of 0.736, specificity of 0.793, negative predictive value of 0.976, and weighted F1-score of 0.837. Under the LOHO protocol, AUROCs ranged from 0.801–0.849. At Youden thresholds (0.34–0.50), sensitivity ranged from 0.671 to 0.776 and specificity from 0.651 to 0.801, confirming generalizability across sites. Conv-BiLSTM-Attn outperformed benchmark models, with Score-CAM highlighting relevant ST–T segments.

Conclusion

This DL model can support accurate and generalizable AMI detection from routine ECGs, with the Conv-BiLSTM-Attn architecture outperforming current benchmark approaches.

ESC 365 is supported by