Copenhagen, Denmark
Onsite/Online

ESTRO 2022

Session Item

Lower GI
Poster (digital)
Clinical
Automated rectal tumor segmentation with inter-observer variability-based uncertainty estimates
Luca Weishaupt, USA
PO-1325

Abstract

Automated rectal tumor segmentation with inter-observer variability-based uncertainty estimates
Authors:

Luca Weishaupt1, Té Vuong2, Alana Thibodeau-Antonacci3, Aurelie Garant4, Kelita Singh5, Corey Miller6, André-Guy Martin7, Fynn Schmitt-Ulms8, Shirin Abbasinejad Enger1

1McGill University, Medical Physics Unit, Department of Oncology, Montréal, Canada; 2Jewish General Hospital, Department of Oncology, Montréal, Canada; 3McGill University , Medical Physics Unit, Department of Oncology, Montréal, Canada; 4UT Southwestern Medical Center, Department of Radiation Oncology, Dallas, USA; 5McGill University Health Centre, Division of Gastroenterology, Montréal, Canada; 6McGill University Faculty of Medicine, Department of Medicine, Montréal, Canada; 7Centre hospitalier universitaire de Québec, Department of Radiation Oncology, Quebec City, Canada; 8McGill University, Department of Computer Science, Montréal, Canada

Show Affiliations
Purpose or Objective

Without biopsies, the task of tumor detection carries an intrinsic uncertainty. However, deep learning models that are used for automatic tumor detection, are typically trained to classify pixels as either tumor or non-tumor, disregarding the uncertainty. This study aims to develop a deep learning-based method that can model this uncertainty.

Material and Methods

Three gastrointestinal physicians and radiation oncologists from three different institutions contoured the tumor regions in 1704 endoscopy images from 21 patients comprising 101 endoscopic exams. Not all images contained tumors. 

A deep learning model was trained to classify pixels that all observers classified as tumor or non-tumor. Regions with inter-observer variability were considered uncertain. 

For training and testing purposes images from 80 exams were used for training, while 21 exams were used for testing, which made up 1392 and 312 images respectively. A soft dice loss function was used to train a deeplabv3 model in PyTorch. Images were resized to 512x512 pixels and normalized before being passed into the model. To increase the model’s robustness, the training set was augmented during each epoch using random rotations and random horizontal and vertical flipping. 

The model’s performance was evaluated for both the tumor and non-tumor class on the test set. Regions with inter-observer variability were not considered in these scores. Thus, for each pixel outside of the regions of inter-observer variability, the ground truth classification was compared to the class with the greater probability from the model's prediction.

Results

A representative example of the model’s performance is displayed in Figure 1 and the model’s performance on the test set is shown in Table 1.


There was significant variability between the three observers’ contours. The majority of all contours that were drawn had a central region where all observers agreed on the presence of tumor, but significant variability in the contours on the edges of the tumor.

Manual contours took on average 10 seconds per image, while the deep learning model is able to segment an image in 0.009 ± 0.004 seconds per image on the test set. Preliminary tests confirmed that real-time tumor segmentation from a video feed is possible at this rate.

Conclusion

Manual tumor segmentation in endoscopy images of rectal cancer patients is prone to significant inter-observer variability. There is uncertainty associated with discerning the edges of a tumor. 

Deep learning can be used to not only detect regions that are likely to contain tumors but also accurately estimate the regions that are the most likely to cause inter-observer variability. The fact that the model can run in real-time and accurately shows tumor regions with their associated uncertainty makes this method appealing for clinical implementation.

In future studies, the model’s generalizability will be investigated by using data from different types of cameras, observers from more institutions, and using more classes.