Signal or noise? Evaluating commonly used attribution methods for explaining deep neural networks in electrocardiogram classification

European Heart Journal - Digital Health

10 March 2026
Organised by: Logo
ESC Journals

Abstract

AbstractAims

Attribution-based explainability methods are widely used in electrocardiogram (ECG) analysis to interpret predictions from ‘black-box’ deep neural networks (DNNs). To be useful in clinical applications, attribution methods must produce explanations that are both clear and reflective of the model’s inner workings. This study evaluates 12 attribution methods in DNN-based ECG classification.

Methods and results

We analysed 12 attribution methods using a dataset of 873 710 median beat ECGs spanning nine diagnostic classes. Methods were applied to convolutional neural network-based models trained for ECG classification. Performance was evaluated across four experiments: inter-method similarity, self-consistency, dependence on model weights, and ability to identify features important for model inference. All task models achieved an area under the receiver operating curve above 0.95. Attribution methods demonstrated low correlation and high variability across inter-method comparisons. Self-consistency across random model initializations was moderate for most methods (mean correlation 0.41–0.65). Randomizing model weights led to rapid loss of correlation, although some methods did not converge to zero. Perturbation of input data revealed differences in how well attribution methods identified features relevant to model performance.

Conclusion

Attribution methods demonstrated limited reliability, instability across model variants and incomplete dependence on learned parameters, constraining their utility in high-stakes settings such as healthcare. These findings suggest that attribution techniques should be used cautiously and supported by task-specific sanity checks. Approaches grounded in rigorous validation, inherently interpretable modelling or counterfactual explanations may better support clinically meaningful insight.

Contributors

Pim van der Harst
Pim van der Harst

Author

University Medical Centre Groningen Groningen , Netherlands (The)

René van Es
René van Es

Author

University Medical Center Utrecht Utrecht , Netherlands (The)

ESC 365 is supported by