Machine learning for precision phenotyping and genomic discovery for heart failure with preserved ejection fraction

European Heart Journal

5 November 2025
Organised by: Logo
ESC Journals

Abstract

AbstractIntroduction

Heart failure with preserved ejection fraction (HFpEF) affects over 32 million people globally, and has a 30% mortality within one year of first hospitalization (1–4). Incidence and mortality are rising sharpy, driven by the increase in obesity and the absence of effective therapies (1–4). Currently there are no HFpEF medications that are disease modifying and none that reduce mortality (4). As genomic-led drug discovery has demonstrated a 2.6-fold improvement in successful drug development, there have been ‘urgent’ calls to understand the genetic architecture of HFpEF (5). Despite this, our current understanding of HFpEF genetics is poor; the only dedicated genome-wide association study (GWAS) discovered one BMI-associated locus (5). The primary limitation is imprecise phenotyping in genetic biobanks (6–9). For example, in the UK Biobank (UKB), all heart failure cases are classified as: "Congestive heart failure", "Left ventricular failure", or "Heart failure, unspecified".

Purpose

To use machine learning (ML) to uncover the full genetic architecture of HFpEF.

Methods

To uncover the full genetic architecture of HFpEF, we performed the following: a) Developed a refined, precise cohort of HFpEF phenotypes, b) Trained ML models to predict this precise HFpEF phenotype, c) Deployed models in UKB and assigned all participants a HFpEF probability and e) Conducted GWAS. For the HFpEF cohort, we identified patients at a medical center who have had an echocardiogram and clinical data. We classified patients as HFpEF in line with guidelines and clinical trials (10,21,22). We then trained 3 separate ML models to predict precise HFpEF: 1) Prediction from 40 biomarkers using XGBoost 2) Prediction from ECGs using neural networks, and 3) Prediction from Cardiac MRIs using neural networks. We deployed these 3 models on UKB participants which generated a HFpEF probability from which we conducted 3 conducted GWAS. We then conducted proteomic analysis to identify proteins highly expressed in participants with high probability of HFpEF in UKB.

Results

Our 3 ML models predicted HFpEF with acceptable accuracy (AUC for biomarker XGBoost model: 0.85 (95%CI: 0.84-0.86), AUC for ECG neural network: 0.79 (0.78-0.80), and AUC for cardiac MRI: 0.75 (0.74-0.76). Our genome-wide association studies reveal 47 novel loci for HFpEF, with leading loci including: FTO, PRKAG2, AL162414, ITGA4, CAPN8, and HABP4. Proteomic analysis revealing a mixture of known associated proteins (e.g. NT-proBNP, Renin, and Leptin) and novel proteins associated with HFpEF, such as: Fatty Acid Binding Protein 4, Synuclein Gamma, and Insulin-like Growth Factor Binding Protein 1.

Conclusions

Machine learning facilitates the prediction of precise HFpEF phenotypes, which in turn reveals the full genetic architecture of HFpEF. Our genetic and proteomic data can serve as therapeutic targets for future research to create truly disease modifying HFpEF medications.

Study Design

GWAS results

Contributors

J O'sullivan
J O'sullivan

Author

Stanford University Stanford , United States of America

T Yun
T Yun

Author

C Mclean
C Mclean

Author

E Ashley
E Ashley

Author