Hybrid machine learning pipeline for multi-timeframe mortality prediction in cardiogenic shock: integrating clustering techniques and comprehensive external validation
European Heart Journal - Acute CardioVascular Care

Abstract
Cardiogenic shock continues to have a high mortality rate over the past decade. A machine learning-based mortality risk score could aid ICU triage, deepen the understanding of cardiogenic shock, and guide intervention strategies based on predictive outcomes.
This study aimed to develop and externally validate a machine learning algorithm for predicting mortality in cardiogenic shock patients using daily-measured routine parameters.
We utilized a dataset from our cardiogenic shock registry, comprising 147 features, to compare various classifiers for 24-hour mortality prediction. Among the tested classifiers, the XGBoost classifier demonstrated the highest performance. The 25 most important features were then selected based on the feature importance scores. A machine-learning pipeline (Figure 1) was trained on patient day formatted data from three different databases including patients with cardiac admission diagnoses, lactate levels ≥2 mmol/L, and vasopressor treatment (SCAI C-E). We applied a novel clustering technique to distinguish data with high and low predictive accuracy, designating them as "high/low confidence subgroups". Subsequently, multiple models were developed to predict mortality over different timeframes (24 hours, 2 days, 3 days, up to 15 days). External validation was performed on three additional datasets extracted from separate publicly available databases to assess model generalizability.
The training set included 23,925 patient days, while the test set comprised 24,037 patient days. For the 24-hour mortality prediction in the training dataset, the high confidence subgroup contained 12,695 patient days, and the low confidence subgroup had 11,230 observations. The model achieved an AUROC of 0.97 in the high confidence subgroup and 0.84 in the low confidence subgroup, resulting in an overall AUROC of 0.92. In external validation, the high confidence subgroup included 11,594 patient days, with the low confidence subgroup having 12,443 observations. The classifier demonstrated an AUROC of 0.92 in the high confidence subgroup and 0.73 in the low confidence subgroup, leading to an overall AUROC of 0.84. Despite a decline in AUROC over time, the precision-recall area under the curve (PRAUC) improved, likely due to label rebalancing (Figure 2).
To enhance interpretability, SHAP waterfall plots and DICE_ML counterfactual reasoning were employed to clarify individual predictions, possibly providing actionable insights for clinicians in the future. A user-friendly interface was developed for clinical application.
This study successfully predicted mortality for patients with cardiogenic shock across multiple timeframes, demonstrating strong AUROC performance, especially in the newly-identified high confidence subgroup.




