A Risk Prediction Scheme for Secondary Primary Esophageal Squamous Cell Carcinoma in Head and Neck Cancer Survivors

Authors

  • Chun-Chia Chen Institute of Medicine, Chung Shan Medical University, Taichung, 40201, Taiwan https://orcid.org/0000-0003-2692-1812
  • Ming-Yi Lu School of Dentistry, College of Oral Medicine, Chung Shan Medical University, Taichung, 40201, Taiwan
  • Chi-Chang Chang School of Medical Informatics, Chung Shan Medical University & IT Office, Chung Shan Medical University Hospital, Taichung, 40201, Taiwan

DOI:

https://doi.org/10.37256/cm.6520255753

Keywords:

prediction scheme, secondary primary esophageal squamous cell carcinoma, head and neck cancer, clinical risk factors, machine learning classifiers, CART

Abstract

In this study, we developed a machine learning scheme to predict the occurrence of Second Primary Esophageal Squamous cell Carcinoma (SPESC) among patients with primary Head and Neck Cancer (HNC). This study retrospectively collected 2,863 records of patients with HNC, including 65 cases of SPESC. Data on 19 risk factors for SPESC were analyzed from the aforementioned records to identify significant risk factors and protective factors for SPESC. On the basis of gain ratios, the following significant risk factors were identified for the occurrence of SPESC in patients with HNC: age at HNC diagnosis < 65 years, tumor grade/differentiation > 2, smoking behavior, drinking behavior, and existence of a tumor depth measurement in the pathology report. The only significant protective factor for SPESC was Body Mass Index (BMI) ≥ 24. Data on the aforementioned factors were integrated into seven machine learning algorithms to predict the occurrence of SPESC: C4.5, C5.0, support vector machine, random forest, Classification and Regression Trees (CART), linear dynamic analysis, and logistic regression. Among these seven algorithms, CART exhibited the largest area under the receiver operating characteristic curve (0.9240); the highest accuracy (0.8678), recall rate (0.8573), F1 score (0.8871), and Matthews correlation coefficient (0.7305); and the lowest false positive rate (0.1990) for SPESC detection. Thus, CART was selected as the machine learning algorithm of the proposed model. The highest model accuracy of 88.0% was achieved under the following candidate factor conditions: drinking behavior: yes, age atdiagnosis: < 65 years, smoking behavior: yes, surgery: no, tumor size: ≤ 4 cm, BMI: < 24, radiotherapy: yes; and tumorgrade/differentiation: > 2. Overall, the developed prediction scheme enables the early and accurate detection of SPESCamong patients with HNC, thus improving patient outcomes and ensuring that patients with a low risk of developingSPESC do not need to undergo unnecessary invasive procedures.

Downloads

Published

2025-09-10