Phenogrouping HFpEF trajectories identifies early and end-stage subtypes: a multicentre study using natural language processing
European Heart Journal

Abstract
Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous and frequently underdiagnosed syndrome, complicating timely intervention. Previous clustering studies were predominantly trial-based, limiting real-world applicability. Characterising HFpEF phenogroups in routine clinical practice could improve diagnosis and support targeted therapies.
This study aimed to characterise HFpEF phenogroups using cluster analysis of real-world electronic health record (EHR) data and investigate their clinical trajectories and outcomes.
We conducted a retrospective cohort study using routinely collected EHR data from two UK centres. HFpEF patients were identified per European Society of Cardiology (ESC) criteria using a validated natural language processing pipeline. Unsupervised clustering was performed using latent class analysis (LCA) applied to ten clinical features. External validation was conducted in an independent cohort. Longitudinal transitions between phenogroups and all-cause mortality were analysed.
Among 2,223 patients (median age 75, 60% female), 89.6% met ESC criteria but without a clinician-assigned diagnosis. LCA identified four phenogroups: (1) Elderly-Atrial Dysfunction (oldest, atrial fibrillation, significant diastolic dysfunction) [N=703 (32%)]; (2) Cardio-Renal-Metabolic (high burden of diabetes, kidney disease, and cardiac remodelling) [N=530 (24%)]; (3) Obesity-Predominant (younger patients with significant obesity and lower NT-proBNP (N-terminal pro-B-type natriuretic peptide) levels) [N=530 (32%)]; and (4) Young-Low Comorbidity (minimal traditional risk factors, mild cardiac dysfunction and the lowest likelihood of clinician-assigned diagnosis) [N=487 (22%)]. External validation (N=3,349) confirmed phenogroup reproducibility. The estimated five-year mortality for the Young-Low Comorbidity phenogroup was 26%. Compared to this group, the Cardio-Renal-Metabolic and Elderly-Atrial Dysfunction phenogroups had significantly higher adjusted mortality risks (HR: 1.49; 95% CI: 1.18-1.87; P < 0.03 and HR: 1.30; 95% CI: 1.02-1.67; P < 0.001, respectively). Over a median 3.97-year follow-up, 53% of Young-Low Comorbidity patients progressed to higher-risk phenogroups.
AI-driven phenotyping identified four clinically distinct and prognostically relevant HFpEF phenogroups. Our findings highlight the high underdiagnosis rate and rapid progression in early-stage HFpEF, reinforcing the need for improved recognition and early intervention. Targeted therapies may hold particular promise for this group, warranting further investigation. Phenogroup trajectories
Contributors

S Brown
Author
King's College Hospital NHS Foundation Trust London , United Kingdom of Great Britain & Northern Ireland

F Soltani
Author

B S Bernstein
Author

T Searle
Author

G Carr-White
Author

R J B Dobson
Author

T A Mcdonagh
Author

T F Luscher
Author

C Miller
Author
University of Manchester Manchester , United Kingdom of Great Britain & Northern Ireland

K O'gallagher
Author




