Self-supervised learning can tackle life-threatening arrhythmia detection data challenges
European Heart Journal - Digital Health

Abstract
Life-threatening arrhythmias (LTAs), such as ventricular tachycardia and ventricular fibrillation, are critical cardiac events that require immediate intervention to prevent sudden cardiac death. Continuous monitoring systems and accurate LTA detection from electrocardiogram (ECG) signals could allow timely intervention. However, two key challenges hinder the development of robust LTA detection models: the low data availability and the significant class imbalance, due to the rarity and difficulty of collecting annotated LTA events.
Recently, Foundation Models (FMs) emerged as powerful large-scale deep learning models trained on huge datasets and easily adaptable to different downstream tasks. They usually exploit self-supervised learning (SSL) approaches to learn from unlabeled datasets, saving time and resources for data labeling.
In this work, we aim to overcome the low data availability and class imbalance challenges with a novel SSL-based FM, pre-trained on a large set of unlabeled ECGs and fine-tuned on limited LTA data for LTA detection. The ultimate goal is to develop a reliable detection system that, combined with wearable devices, can be used for continuous monitoring and immediate alert to the local emergency service.
We pre-trained our model on 406’117 ECGs from open-access datasets (without LTA events), with an SSL approach. We then fine-tuned the model on the target dataset (82 recordings including LTA events).
We tested different models to explore two batch sampling strategies (uniform and stratified) and four loss functions (binary cross-entropy and its weighted, focal, and weighted-focal variants). Class-weighted loss and uniform sampling aim to enhance the detection of the underrepresented LTA class, while focal loss focuses on hard-to-classify cases. The SSL approach, instead, aims to address the data scarcity issue by leveraging the knowledge extracted from the large pretraining dataset.
Finally, we compared the SSL approach with the traditional supervised training of the model directly on the target LTA dataset.
The best SSL model achieved 92.21% macro F1 (MF1), 97.35% sensitivity (Se), and 97.36% specificity (Sp) with the baseline loss and stratified batch sampling. The best supervised model, instead, resulted in 89.55% MF1, 95.5% Se, and 96.4% Sp, exploiting weighted-focal loss and uniform batch sampling. Overall, the SSL approach (average across all fine-tuned models: 97.38 ± 1.44 Se, 96.21 ± 0.62 Sp, 89.67 ± 1.59 MF1) outperformed the fully supervised one (93.37 ± 2.43 Se, 93.28 ± 2.02 Sp, 83.17 ± 3.83 MF1), achieving high performance regardless of balancing strategy.
The SSL model proved superior performance in LTA detection, and robustness to data scarcity and class imbalance without the need for specific balancing strategies, paving the way to reliable real-time LTA monitoring systems for life-saving timely intervention. Performance comparison
Contributors

G Conte
Author

A Tzovara
Author

F D Faraci
Author
Institute of Digital Technologies for Personalized Healthcare Viganello , Switzerland

