Navigating Multicollinearity in Linear Regression Models: Implications for Big Data Analysis
DOI:
https://doi.org/10.37256/cm.7320268305Keywords:
big data, cross-validation, multicollinearity, multiple linear regression, sufficient statisticsAbstract
The consequences of multicollinearity in regression analysis involving small, moderate, or high-dimensional datasets are well-established, and many notable solutions exist. However, the consequences of multicollinearity when considering big data, specifically data with a large number of observations, are not well established. In this paper, we determine the impact of multicollinearity on the linear regression model when applied to big data by numerically evaluating the bias, variance, and signs of the estimated regression coefficients. An extensive simulation study shows that multicollinearity does not substantially alter the statistical measures under consideration. Our analysis is also applied to a real-world dataset for method demonstration.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Salomi du Plessis, et al.

This work is licensed under a Creative Commons Attribution 4.0 International License.
