Copenhagen, Denmark
Onsite/Online

ESTRO 2022

Session Item

Saturday
May 07
14:15 - 15:15
Mini-Oral Theatre 1
05: Image acquisition & processing
Malin Kügele, Sweden;
Nanna Sijtsema, The Netherlands
Mini-Oral
Physics
Deep learning auto-segmentation vs human inter observer variability of normal tissue in the brain
Jesper Kallehauge, Denmark
MO-0217

Abstract

Deep learning auto-segmentation vs human inter observer variability of normal tissue in the brain
Authors:

Jesper Kallehauge1, Camilla Skinnerup Byskov2, Christian Rønn Hansen3, Yasmin Lassen-Ramshad1, Anouk Trip1, Slavka Lukacova2, Lene Haldbo-Classen2, Ebbe Laugaard Lorenzen4

1Aarhus University Hospital, Danish Centre for Particle Therapy, Aarhus N, Denmark; 2Aarhus University Hospital, Department of Oncology, Aarhus N, Denmark; 3Odense University Hospital, Laboratory of Radiation Physics, Odense, Denmark; 4Odense University Hospital, Laboratory of Radiation Physics, Odense, Denmark

Show Affiliations
Purpose or Objective

Automatic segmentations of organs at risk (OAR) are often compared to single observer delineations, which especially for small organs may result in higher inter-observer disagreement. A more fair comparison is to validate the auto-segmentation to a dataset where multiple observers have defined the same structures. Hence the objective of this study was to validate a Deep Learning semantic segmentation of selected OAR according to the Danish Neuro Oncology (DNOG) guidelines on multi-observer gold standard segmentations.

Material and Methods

A 3D U-Net architecture was trained to segment Brainstem, Hippocampi, Chiasm, Pituitary, and Optical Tracts on 58 glioma patients' 3D T1w MRI of the brain (46 patients were scanned on a 3 T MRI and 12 were scanned at 1.5 T). This network was subsequently evaluated on a holdout test set consisting of 13 patients (nine patients were scanned at 3T MRI and four were scanned at 1.5T MRI). The performance of the automatic segmentation was finally evaluated using Dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95) on three randomly chosen patients from the gold standard data having 13 patients in total. Each structure for each of the three patients were defined on average by 8 radiation oncologists with experience in neuro-oncology. The automatic segmentation was compared to each of the 8 expert segmentations and the mean metric HD95 and DSC was determined and compared to the mean expert metrics. 

Results
The auto-segmented structures performed similar to expert delineated structures in respect to DSC and HD95. (figure 1 and figure 2). Although, the optical tracts appear to perform slightly worse when comparing DSC. The auto-segmentation showed similar or reduced variability especially for the pituitary gland but generally across all the investigated structures. Few expert segmentations were identified with large deviations from the median metrics indicative of being outliers. Especially some segmentations with HD95 well beyond the interquartile range are suspicious. 

Conclusion

The proposed Deep Learning segmentation algorithm performed generally very well showing similar performance as the experts for all structures except the optical tracts when comparing DSCs.  This discrepancy may be attributed to the small volume of these structures where the DSC is known to be especially sensitive to small variations in delineations.  The expert segmentations with large deviation from the general trend of the expert segmentations (HD95 well beyond the interquartile range)  were visually inspected and was found not to adhere to the DNOG guidelines. The auto segmentations were on the other hand in good agreements with the general expert trend and a potential added benefit of auto segmentation could be outlier detection of subpar segmentations for example in retrospective datasets. To conclude, this deep learning network is of such high quality as to be used in clinical workflow and shows potential as a tool for delineation outlier detection.