Large language models approach clinician performance in ESC cardiovascular risk stratification: a vignette-based benchmark study
European Heart Journal - Digital Health

Abstract
Guideline-based cardiovascular risk stratification requires three distinct competencies: extracting risk factor data from clinical text, computing a validated risk score, and applying guideline-defined thresholds to assign a final risk category. We evaluated contemporary large language models (LLMs) on each of these tasks within the European Society of Cardiology (ESC) SCORE2 framework and compared LLM performance against a pooled individual clinician benchmark to contextualize findings against real-world human reproducibility.
Eleven LLMs were evaluated using 30 simulated outpatient clinical vignettes presented in both Portuguese and English. For each vignette, models extracted cardiovascular risk factors, determined SCORE2 applicability, generated 10-year risk estimates where appropriate, and assigned a final three-class ESC risk category. A committee of three cardiologists established the reference standard; eight independent clinicians provided an individual-level human benchmark. Traditional risk-factor extraction was near-perfect across all models (micro-F1 0.97–0.99). Agreement with expert-assigned final risk categories was moderate and variable (best: GPT-4o, quadratic-weighted κw 0.69, 95% CI 0.44–0.84), with 10 of 11 models more often underestimating than overestimating risk. To isolate the source of classification error,
Contemporary LLMs reliably extract cardiovascular risk information from clinical text, and the best-performing systems achieved agreement within the range of average individual clinicians on this structured task. Their principal limitation lies in downstream computation and rule application.
Contributors

José Ferreira Santos
Author

Regina de Brito Duarte
Author

Inês Mota
Author

Rita Carvalheira Santos
Author

José Maria Moreira
Author

Joana Campos
Author

Nuno André Silva
Author

Bernardo Neves
Author

Francisca Leite
Author

Hélder Dores
Author




