AI in cardiology: level of evidence map

European Heart Journal - Digital Health

12 January 2026

Organised by:

Abstract

AbstractIntroduction

Artificial intelligence (AI) is revolutionising medical care and is expected to play a prominent role in shaping the future of cardiovascular medicine. However, the rapid pace of technological advancements is not matched by equivalent progress in the regulation and clinical appraisal of these systems.

Purpose

This review aimed to evaluate the maturity and clinical relevance of AI technologies in cardiology by assessing the grade of evidence (GoV) supporting diagnostic, prognostic, and therapeutic applications.

Methods

We conducted a systematic review of studies published between 2014 and 2025 in Medline and Embase, using MeSH terms specific to cardiology and AI, developed by cardiologists and AI experts, following PRISMA guidelines. The strength of evidence was assessed using the GoV framework, ranging from GoV 1 (internal validation, excluded) to GoV 6 (systematic reviews and meta-analyses of RCTs). Only studies with GoV ≥2 were included: external validation by the original team (GoV 2), independent group (GoV 3), prospective multicentre (GoV 4), RCTs (GoV 5), or meta-analyses (GoV 6). Data on study design (prospective, retrospective, cross-sectional), primary objective (diagnostic, therapeutic, prognostic), cardiology subspecialty, sample size, adherence to 8 reporting guidelines, data openness, and Altmetric scores, were collected

Results

A total of 4,518 articles were initially identified, 972 articles were assessed for external validation, and 515 studies (53.0%) met the minimum GoV of 2 and were included. The number of studies decreased with increasing GoV grade: GoV 2 (n=290, 29.8%), GoV 3 (n=103, 10.6%), GoV 4 (n=71, 7.3%), GoV 5 (n=20, 2.1%), and GoV 6 (n=31, 3.2%). Most studies were retrospective (n=305, 63.0%), followed by prospective (n=110, 22.7%) and cross-sectional designs (n=69, 14.3%). Diagnostic applications dominated (n=347, 67.4%), followed by prognostic (n=133, 25.8%) and therapeutic applications (n=35, 6.8%). Cardiovascular imaging (n=219) and non-invasive electrophysiology (n=180) were the most frequent fields, though they ranked lower in average GoV grade (2.86 and 2.76, respectively). Higher validation quality was observed in electrophysiology (3.16), coronary syndrome (3.14), hypertension (3.11), and valvular disease (3.05). Most models lacked open access or transparent sharing, with only 44.7% of studies declaring data availability. Only 5.6% adhered to TRIPOD guidelines.

Conclusion(s)

Despite growing interest and promising diagnostic applications—especially in imaging and ECG analysis —most AI models in cardiology lack robust external validation, prospective testing, and therapeutic relevance. This evidence map reveals substantial gaps in validation quality, transparency, and clinical applicability. Advancing safe and effective implementation will require coordinated efforts to improve methodological rigor, reporting standards, and real-world integration of AI technologies.