A Simulation Study Comparing Tree-Based Methods in Identifying Interactions of Continuous and Binary Variables for Prediction of Increased Risk of Disease
DOI:
https://doi.org/10.37256/bsr.1120232148Keywords:
dichotomization, biostatistics, tree graphs, conditional inference, interactions, classification and regression trees, logic regression, optimal cutpointsAbstract
Tree-based methods are commonly used to create models that predict an output based on several input variables. Classification and Regression Trees (CARTs) is a popular algorithm that builds tree-like graphs for predicting continuous and categorical dependent variables, but it has been shown to be biased toward the inclusion of continuous variables. Conditional inference is a technique used to alleviate this bias. C.Logic is an alternative tree-based method that uses Boolean logic to create classification trees. Previous research has shown that C.Logic is superior to CART in identifying interactions that lead to an increased risk of disease. No comparison has been made between the C.Logic package and CART with conditional inference as found in a package called Party. In this paper, a simulation study is used to compare the capability of these two algorithms to identify interactions between continuous and binary variables. It is found that while both methods succeed in identifying correct interactions, C.Logic is more effective. The C.Logic algorithm does a better job of alleviating the bias toward continuous variables when attempting to identify interacting variables that lead to an increased risk of disease.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Sybil Prince Nelson
This work is licensed under a Creative Commons Attribution 4.0 International License.