A Simulation Study Comparing Tree-Based Methods in Identifying Interactions of Continuous and Binary Variables for Prediction of Increased Risk of Disease

Authors

DOI:

https://doi.org/10.37256/bsr.1120232148

Keywords:

dichotomization, biostatistics, tree graphs, conditional inference, interactions, classification and regression trees, logic regression, optimal cutpoints

Abstract

Tree-based methods are commonly used to create models that predict an output based on several input variables. Classification and Regression Trees (CARTs) is a popular algorithm that builds tree-like graphs for predicting continuous and categorical dependent variables, but it has been shown to be biased toward the inclusion of continuous variables. Conditional inference is a technique used to alleviate this bias. C.Logic is an alternative tree-based method that uses Boolean logic to create classification trees. Previous research has shown that C.Logic is superior to CART in identifying interactions that lead to an increased risk of disease. No comparison has been made between the C.Logic package and CART with conditional inference as found in a package called Party. In this paper, a simulation study is used to compare the capability of these two algorithms to identify interactions between continuous and binary variables. It is found that while both methods succeed in identifying correct interactions, C.Logic is more effective. The C.Logic algorithm does a better job of alleviating the bias toward continuous variables when attempting to identify interacting variables that lead to an increased risk of disease.

Downloads

Published

2023-04-18

How to Cite

1.
Prince Nelson S. A Simulation Study Comparing Tree-Based Methods in Identifying Interactions of Continuous and Binary Variables for Prediction of Increased Risk of Disease. Biostatistics Research [Internet]. 2023 Apr. 18 [cited 2024 Apr. 24];1(1):72-7. Available from: https://ojs.wiserpub.com/index.php/BSR/article/view/2148