Cost-aware prediction (CAP): an LLM-supported machine learning pipeline for interpreting heart failure mortality predictions
European Heart Journal - Digital Health

Abstract
Machine learning (ML) models for clinical prediction are often developed independently of downstream value, interpretability, and decision-making needs. As a result, model outputs may be difficult to act upon in practice, especially when trade-offs between various clinical aspects are not explicitly addressed.
This paper presents a cost-aware prediction (CAP) framework designed to support 1-year mortality prediction for heart failure (HF) patients to identify those eligible for home care based on high predicted mortality risk. By leveraging cost-benefit analysis and large language model (LLM) agents, the framework facilitates clinical interpretability and helps stakeholders navigate real-world trade-offs in ML-based decision support.
First, we developed an ML model predicting 1-year mortality using electronic health records including all patients with a first in-hospital HF diagnosis between 2017-2023 (N=30,021, 22% mortality) from Region Västra Götaland, Sweden (total population 1.8 million people). Second, we introduced clinical impact projection (CIP) curves to visualise important cost dimensions – quality of life and healthcare provider expenses. These costs were further divided into treatment and error costs, to assess the consequences of both correct and incorrect predictions. Finally, we developed four LLM agents to generate individual patient-specific descriptions explaining the certainty of the prediction, cost-benefit considerations, and risk reduction strategies. The system was evaluated by clinicians for decision support value.
For the ML predictive modelling, the eXtreme gradient boosting model achieved the best performance, with an area under the receiver operating characteristic curve (AUROC) of 0.804 (95%; confidence interval (CI) 0.792-0.816), area under the precision-recall curve (AUPRC) of 0.529 (95% CI 0.502-0.558) and a Brier score of 0.135 (95% CI 0.130-0.140) shown in Figure 1. The CIP cost curves provided a population-level overview of cost composition across decision thresholds. LLM-generated cost-benefit descriptions enabled prediction analysis at individual patient-levels. An illustrative example can be found in Figure 2. The system was generally well received according to the evaluation with clinicians. However, clinical feedback emphasised the need to strengthen the technical accuracy for speculative tasks.
An ML algorithm was developed to predict 1-year mortality risks. Further, the CAP framework utilises LLM agents to integrate this ML classifier outcomes and cost-benefit analysis, to support clinical decision-making by more transparent and interpretable communication. While receiving positive clinical feedback, future work is needed for fine-tuning of LLM narratives for optimal accuracy and usefulness.
Contributors

F Dippel
Author

C E Lundberg
Author
Institute of Medicine - Sahlgrenska Academy - University of Gothenburg Gothenburg , Sweden

M Lindgren
Author

A Rosengren
Author

M Adiels
Author

