Deep learning-based automatic segmentation in pediatric cardiovascular imaging
European Heart Journal - Digital Health

Abstract
Automatic segmentation of cardiovascular structures from medical images is essential for the diagnosis and monitoring of congenital heart disease. Manual segmentation is a labour-intensive and time consuming process. In this case several Deep learning models, including the U-Net architecture, offer a better alternative for automatic and precise segmentation. Achieving high accuracy is crucial and challenging. Moreover it depends heavily on dataset quality, data processing strategies, and model architecture.
This study set out to develop and evaluate a 3D U-Net model tailored for segmenting major cardiovascular structures in pediatric medical images. Our goal was to understand how different strategies—such as dataset composition (local, open-source, and combined), data augmentation, hyperparameter tuning, and architectural enhancements like attention mechanisms—affect segmentation performance.
We trained a 3D U-Net using three dataset configurations: a local clinical dataset (48 patients), and a combined dataset (85 patients). Training was conducted with a high-performance computing (HPC) system which used Tesla-V100 GPU. For assessing the impact of various optimization methods, we systematically experimented with and without data augmentation techniques. This included random affine and elastic transformations. Additionally, we performed hyperparameter tuning using Optuna and compared results without using the framework to serve as baselines. Furthermore, we compared the performance of a standard 3D U-Net architecture with that of a modified version incorporating Attention Gates, as well as without them. We assessed model performance using the Dice Similarity Coefficient (DSC) and Jaccard Index.
We trained the U-Net model on the combined dataset of 85 patients. The predicted mask revealed improved performance differences based on the applied methodology. The U-Net model was configured for 1000 epochs. Various methods were implemented for improving performance of the models including data augmentation, Optuna hyper parameter tuning. The model predicted a mean Dice coefficient of 0.8330 and a Jaccard Index of 0.7356. In contrast, Attention Gates into the model resulted in a lower mean Dice Coefficient of 0.7774. Qualitative results (Figure 1) demonstrate the segmentation capabilities of models trained on the combined dataset when tested on unseen local and open-source patient data. A summary of these key performance metrics is presented in Table 1.
On a combined, heterogeneous dataset, a standard 3D U-Net architecture without augmentation yielded the most consistent segmentation performance. Pre-processing techniques such as CLAHE remain valuable complementary methods for enhancing data quality. Qualitative segmentation results of the Consolidated Performance Metrics Across
Contributors

J S F Josephin
Author

K B Kose
Author

F Z Gungoren
Author

M H Alzaeim
Author

O F Sahin
Author

I Yilmaz
Author

M A K Umair
Author

M O Alboushi
Author

R Mirza
Author

F Aktay
Author

B N Yazici
Author

A Boray
Author

Z Dulli
Author

I Faress
Author
