Explainable visual transformer based scoring of CAD-RADS from coronary CT angiography multiplanar projections
European Heart Journal - Digital Health

Abstract
Accurate evaluation of the severity and extent of Coronary Artery Disease (CAD) is critical for managing cardiovascular risk. Coronary CT Angiography (CCTA) is the first-line non-invasive modality for this purpose. Yet, interpreting scans and assigning standardized CAD-RADS scores1 remains a resource-intensive, and operator-dependent task. This creates a demand for automated and transparent strategies to support routine screening.
Building on a prior MaxViT AI model we developed for the purpose2, we present a novel web-based platform for CAD-RADS scoring from CCTA that requires no additional annotations beyond the final predicted risk label. The system is built to reflect real-world clinical decision-making, offering not only categorical risk prediction but also intuitive visual and textual explainability to facilitate user confidence and foster adoption.
We employ 2D representations of the left anterior descending (LAD), left circumflex (LCX), and right coronary arteries (RCA) extracted from standard CCTA. LAD, LCX and RCA views are combined into tri-channel composite images, including generative missing vessel imputation in case one of the three is missing (e.g. when one of the vessels is fully healthy). A Multi-Axis Vision Transformer (MaxViT) was fine-tuned to perform two tasks: (1) stratify patients into low(CAD-RADS 0), intermediate(1-3), or high(4+) CAD risk bins; and (2) identify cases warranting further invasive investigation (e.g. ICA). Additional outputs include heatmaps showing salient image regions driving AI prediction, employing DeepSHAP3, and automatically generated narrative summary explanations of findings, generated through similarity-based RAG employing radiologists written CCTA reports.
The model was trained and tested on a cohort of 253 patients. In the three-class CAD risk prediction task, it achieved an AUC of 0.93 [95% CI: 0.84–0.99] and a classification accuracy of 0.88 [0.79–0.95]. For the binary decision task (follow-up investigation needed?), it showed an AUC of 0.87 [0.76–0.95]. Comparative benchmarks demonstrated superior performance against conventional CNN and attention-based baselines. The trained model is computationally lightweight and can be deployed on modest hardware, typically available for radiology clinical staff. A working prototype with an interactive dashboard was developed (see Figure 1) and tested, highlighting good usability for non-technical clinical users.
we present an automated, annotation-light CAD-RADS classification tool that mirrors physician workflows using standard CCTA imaging. Combining accuracy with visual and textual explainability, we address key limitations in current CAD screening: time burden, inter-rater variability, and transparency. The prototype shows good potential for integration in population-level screening programs without requiring computational resources currently scarcely available in healthcare organizations.

