Development and validation of a parsimonious AI-based risk score for mortality in heart failure: a UK cohort study

European Heart Journal - Digital Health

12 January 2026
Organised by: Logo
ESC Journals

Abstract

AbstractBackground

Accurate risk stratification in heart failure (HF) populations is crucial. Existing simple scores, such as MAGGIC, show limited discrimination (C-index < 0.80) and require specialised tests like echocardiography. Conversely, while advanced Artificial Intelligence (AI) models offer greater accuracy, their operational complexity and data access requirements often hinder clinical adoption.

Purpose

To develop and validate a parsimonious, high-performing AI-based risk model for all-cause mortality in HF patients by leveraging insights from a complex AI model and feature engineering.

Methods

Using a cohort of 373,389 patients with HF (≥18 years; 1,153 English practices for derivation, 289 for validation) from the Clinical Practice Research Datalink (CPRD) Aurum dataset, we developed and validated an AI-based risk prediction model for all-cause mortality. A MAGGIC-based Cox proportional hazards model adapted for electronic health records (EHR) was established as a performance benchmark. An initial Multi-layer Perceptron (MLP) model with a survival framework was trained with MAGGIC variables. This MLP model was then enhanced by incorporating highly predictive comorbidity features—identified by an explainable Transformer model trained on longitudinal electronic health record (EHR) data—and feature distillation to derive a final, optimised 11-variable model (named MLP-Distilled). The model employed readily ascertainable variables including age, BMI, year of birth (as a proxy for birth cohort effects), and key comorbidities such as cancers.

Results

MLP-Distilled demonstrated significantly improved discriminatory performance (C-index: 0.801, 95% CI [0.796, 0.806]) compared to the benchmark MAGGIC-EHR model (0.735, [0.729, 0.742]), while maintaining excellent calibration. This parsimonious model uses fewer and more accessible variables than MAGGIC (11 vs. 13) and requires no specialised tests. The model was validated for secondary outcomes of cardiovascular events and rehospitalisation.

Conclusion

Utilising knowledge from a complex EHR-trained AI model combined with feature distillation yielded a parsimonious yet powerful risk score for HF mortality. This novel approach offers a pathway to clinically implementable tools that balance predictive accuracy with pragmatic utility, potentially improving routine HF risk stratification.

Discrimination_Calibration.jpeg

Impact_Analysis_50_Threshold.jpeg

Contributors

N Ahmed
N Ahmed

Author

University of Oxford Oxford , United Kingdom of Great Britain & Northern Ireland

S Rao
S Rao

Author

University of Oxford Oxford , United Kingdom of Great Britain & Northern Ireland

K Rahimi
K Rahimi

Author

University of Oxford Oxford , United Kingdom of Great Britain & Northern Ireland

ESC 365 is supported by