Vienna, Austria

ESTRO 2023

Session Item

Saturday
May 13
09:00 - 10:00
Stolz 2
TCP/NTCP modelling and prediction
Karen Kirkby, United Kingdom;
Nienke Hoekstra, The Netherlands
Mini-Oral
Physics
Independent external validation of four NTCP models for head and neck cancer patients
Francesca Emiro, Italy
MO-0061

Abstract

Independent external validation of four NTCP models for head and neck cancer patients
Authors:

Francesca Emiro1, Cecilia Bonfiglio2,3, Maria Giulia Vincini4, Silvano Vignati5, Sara Gandini5, Mattia Zaffaroni2, Stefania Volpe2,6, Giulia Marvaso2,6, Domenico Genovesi3, Giovanni Luca Gravina7,8, Daniela Alterio2, Barbara Alicja Jereczek-Fossa2,6

1European Institute of Oncology IRCCS, Unit of Medical Physics, Milan, Italy; 2European Institute of Oncology IRCCS, Division of Radiation Oncology, Milan, Italy; 3SS. Annunziata Hospital, “G. D’Annunzio” University of Chieti, Radiation Oncology Unit, Chieti, Italy; 4European Institute of Oncology IRCCS, Division of Radiation Oncology , Milan, Italy; 5European Institute of Oncology IRCCS, Department of Experimental Oncology, Milan, Italy; 6University of Milan, Department of Oncology and Hemato-Oncology, Milan, Italy; 7University of L'Aquila, Laboratory of Radiobiology, Department of Biotechnological and Applied Clinical Sciences, L'Aquila, Italy; 8University of L'Aquila, Division of Radiation Oncology, Department of Biotechnological and Applied Clinical Sciences, L'Aquila, Italy

Show Affiliations
Purpose or Objective

We aimed to externally validate normal tissue complication probability (NTCP) models for radiation-related toxicity in head and neck (HN) cancer patients (pts) treated with curative radiotherapy (RT)

Material and Methods

Retrospective analysis of pts treated from 2015 to 2021. Four NTCP models were evaluated: 1) physician-rated swallowing dysfunction at 6 months (m) (Christianen et al.); 2) tube feeding dependence at 6 m (Wopken et al.) 3) Incidence of G≥2 laryngeal edema within 15 m from RT (Rancati et al.); 4) acute oral and oropharyngeal mucositis (OM) G≥3 at any time during RT and OM mean G during RT treatment weeks ≥1.5 (Orlandi et al.). Pts already undergone oncological treatments on the HN district and/or with distant metastases and/or synchronous tumors or treated with altered fractionation RT schedule were excluded. Toxicities were retrieved from electronic medical charts. Validation analyses included: 1) Model performance using the Brier score 2) discrimination ability using the area under the receiver operating characteristic curve (AUC) 3) calibration using calibration intercept and slope 4) Hosmer–Lemeshow goodness-of-fit test to evaluate the calibration of the model. The model’s predictions fit the data at an acceptable level if the Hosmer– Lemeshow goodness-of-fit test statistic is>0.05


Results

150 pts were eligible for the analyses; pts’ and tumor’s characteristics are reported in Table 1. Of these, those available for NTCP analysis and incidence of corresponding toxicities were distributed as follows: 1) 97 pts eligible for dysphagia at 6 months (5% events); 2) 88 pts eligible for tube feeding dependence (3% events); 3) 102 pts eligible for G≥2 laryngeal edema (21% events); 4)  113 pts eligible for mucositis G≥3 (42% events); 5) 114  pts for mucositis  G≥1.5 (63% events). All AUC of NTCP models are reported in Fig.1. Considering all validation tests, all the models, except for Christianen model, show adequate fit. The best discrimination is reached by Wopken model (AUC = 0.73), but the limited number of cases influences the precision of the estimates, as can be seen by the wide confidence interval (CI 95%: 0.67-0.98). No model has evidence of good calibration. Non parametric tests concerning different scores show a significant difference between pts with events and other pts, only for Rancati model (p-value = 0.02 for laryngeal edema). The low number of events for NTCP models of dysphagia at 6 months and tube feeding dependence prevented statistically robust results

Conclusion

All the model (except for Orlandi OM mean G≥1.5) suggest a good discrimination and a sufficient fit (except for Christianen). Calibration, considered as distance as observed and predicted probabilities, is low probably because of the low number of events. Further analyses are currently ongoing to confirm the assessment of the performance of both dysphagia and tube-feeding NTCP models.