Copenhagen, Denmark
Onsite/Online

ESTRO 2022

Session Item

Radiomics, modelling and statistical methods
Poster (digital)
Physics
Evaluation of two commercial deep learning OAR segmentation models for prostate cancer treatment
Jenny Gorgisyan, Sweden
PO-1776

Abstract

Evaluation of two commercial deep learning OAR segmentation models for prostate cancer treatment
Authors:

Jenny Gorgisyan1, Ida Bengtsson1, Michael Lempart1,2, Minna Lerner1,2, Elinore Wieslander1, Sara Alkner3,4, Christian Jamtheim Gustafsson1,2

1Skåne University Hospital, Department of Hematology, Oncology and Radiation Physics, Lund, Sweden; 2Lund University, Department of Translational Sciences, Medical Radiation Physics, Malmö, Sweden; 3Lund University, Department of Clinical Sciences Lund, Oncology and Pathology, Lund, Sweden; 4Skåne University Hospital, Clinic of Oncology, Department of Hematology, Oncology and Radiation Physics, Lund, Sweden

Show Affiliations
Purpose or Objective

To evaluate two commercial, CE labeled deep learning-based models for automatic organs at risk segmentation on planning CT images for prostate cancer radiotherapy. Model evaluation was focused on assessing both geometrical metrics and evaluating a potential time saving.

Material and Methods

The evaluated models consisted of RayStation 10B Deep Learning Segmentation (RaySearch Laboratories AB, Stockholm, Sweden) and MVision AI Segmentation Service (MVision, Helsinki, Finland) and were applied to CT images for a dataset of 54 male pelvis patients. The RaySearch model was re-trained with 44 clinic specific patients (Skåne University Hospital, Lund, Sweden) for the femoral head structures to adjust the model to our specific delineation guidelines. The model was evaluated on 10 patients from the same clinic. Dice similarity coefficient (DSC) and Hausdorff distance (95th percentile) was computed for model evaluation, using an in-house developed Python script. The average time for manual and AI model delineations was recorded.

Results

Average DSC scores and Hausdorff distances for all patients and both models are presented in Figure 1 and Table 1, respectively. The femoral head segmentations in the re-trained RaySearch model had increased overlap with our clinical data, with a DSC (mean±1 STD) for the right femoral head of 0.55±0.06 (n=53) increasing to 0.91±0.02 (n=10) and mean Hausdorff (mm) decreasing from 55±7 (n=53) to 4±1 (n=10) (similar results for the left femoral head). The deviation in femoral head compared to the RaySearch and MVision original models occurred due to a difference in the femoral head segmentation guideline in the clinic specific data, see Figure 2. Time recording of manual delineation was 13 minutes compared to 0.5 minutes (RaySearch) and 1.4 minutes (MVision) for the AI models, manual correction not included.

Figure 1. DSC scores (mean values with 1 STD as error bars) for the RaySearch model (top) and MVision model (bottom).

Table 1. Mean Hausdorff distance ± 1 STD (mm) for different anatomical structures presented for both models.


FemoralHead_R

n=53

FemoralHead_L

n=53

Bladder

n=54

Rectum

n=53

BowelBag

n=13

Penilebulb

n=25
RaySearch 
55±7
53±7
5±5 
18±10
--
MVision 
59±5
59±5
4±4
12±7
140±23
7±2

Figure 2. Femoral head segmentation: clinical data (left), RaySearch original model result (middle) and re-trained RaySearch model result (right). The clinical segmentation includes only a sphere-like structure to represent the femoral head, whereas the RaySearch segmentation in original model includes both femoral head and neck. 

Conclusion

Both AI models demonstrate good segmentation performance for bladder and rectum. Clinic specific training data (or data that complies to the clinic specific delineation guideline) might be necessary to achieve segmentation results in accordance to the clinical specific standard for some anatomical structures, such as the femoral heads in our case. The time saving was around 90%, not including manual correction.