ESTRO 2023

Session Item

Saturday

May 13

10:30 - 11:30

Lehar 1-3

Autosegmentation & automation for QA

Co-Chair: Daniel Sandys, United Kingdom;

Chair: Jan Lagendijk, The Netherlands

Session Type: Proffered Papers

Track: Physics

Journey:

11:30 - 11:40

Deep Learning Segmentation of Cardiac Substructures in Radiotherapy Planning

Leonard Nuernberg, The Netherlands

Presentation Number: OC-0122

Abstract

Abstract Title:

Deep Learning Segmentation of Cardiac Substructures in Radiotherapy Planning

Authors:

Leonard Nürnberg¹, Dennis Bontempi¹, Karolien Verhoeven¹, Hayian Zeng¹, Richard Canters¹, Enrique Hortal Quesada², Francesca Romana Giglioli³, Umberto Ricardi⁴, Mario Levis⁴, Dirk De Ruysscher¹, Alberto Traverso¹

¹GROW School for Oncology and Reproduction, Maastricht University Medical Centre+, Department of Radiation Oncology (Maastro), Maastricht, The Netherlands; ²Maastricht University, Department of Data Science and Knowledge Engineering, Maastricht, The Netherlands; ³A.O.U. Città della Salute e della Scienza di Torino, Medical Physics Unit, Turin, Italy; ⁴University of Turin, Department of Oncology, Turin, Italy

Show Affiliations

Purpose or Objective

The radiosensitivity of the different substructures of the heart varies. Nevertheless, manual segmentation of these substructures in planning computed tomography (pCT) required for dose re-calculations is time-consuming and prone to inter- and intra-observer variability. In this study, we aim to develop a deep learning (DL) solution for segmenting the cardiac substructures in pCT.

Material and Methods

A cohort of 180 lymphoma and lung cancer patients treated with radiotherapy (RT) was retrospectively collected and used. Left and right ventricles (LV, RV), left and right atria (LA, RA), right coronary artery (RCA), left anterior descending artery (LAD), and the circumflex branch of the left coronary artery (CFLX) were manually annotated. The training, validation, and testing split was 70%/15%/15%. We extracted 53 axial 149x149 slices for each patient, limited to the maximum cardiac border, and developed seven individual 2D U-Net models for each region of interest (ROI). We investigated the effect of heart masking by removing all CT data outside the heart contour and four different loss functions - binary cross-entropy loss (BCEL), binary Dice score loss (BDL), binary focal loss (BFL), and the combo loss (CL). We explored different thresholds (TH) for generating binary segmentation masks. We used the Surface Dice score (SDS) with 3 (5) mm tolerance for small (large) ROIs as an evaluation metric. Two radiation oncologists (RO) conducted a quality assessment for all the automatic annotations obtained on the test set. Each RO estimated the time spent on modifications, which we compared to the original delineation time.

Results

The average SDS for a test patient was (0.74 ± 0.21). The results per ROI were LV (0.88 ± 0.07), LA (0.88 ± 0.09), RA (0.87 ± 0.09), RV (0.83 ± 0.08), LAD (0.7 ± 0.16), CFLX (0.56 ± 0.27), RCA (0.48 ± 0.18). The best results were obtained when masking was applied. The influence of the loss function differed for each ROI. While BCE and BDS led to better results for the large ROIs (RV, LV, RA, LA), only the CL led to usable results for the small ones (RCA, LAD, CFLX). The selection of a lower TH led to better SDS for the RCA (p < 0.05) but caused over-contouring, as revealed by the RO assessment. 100% of large ROIs and 55% of small ROIs were rated acceptable by all the ROs. Disagreement between experts on the acceptance of a ROI was only found for the LAD, CFLX and RCA with 9%, 18% and 41%, respectively. Using the automatic annotations as prior could save ROs about 20 minutes per patient (> 50%).

Conclusion

We trained a DL pipeline that delineates the LV, RV, LA, and RA with high clinical acceptance. We propose heart masking during preprocessing for all ROI and highlight the CL for smaller ones. Using our model, delineation times for all ROIs can be significantly reduced. We find that evaluation metrics cannot replace clinical evaluation, i.e., for LA, RA, and CFLX, we observe increasing mean SDS values for decreasing quality levels.