Mutual Exploration for Missing Data Imputation, QoS Parameter Selection, and QoS Prediction in 5G Networks Using a Novel Skewness Driven Distribution Imputation Algorithm, Pearson Correlation, and XGBoost
DOI:
https://doi.org/10.37256/cnc.2220245534Keywords:
skewness driven distribution imputation, vehicle to everything network, 5G QoS, machine learningAbstract
Pre-processing is a key stage in the Machine Learning (ML) pipeline. In such a stage, data is prepared and organized for feeding it to the ML models for a prediction task. One of the problems that might incur in this stage is that data may have missing values, requiring that either the data is deleted or imputed with data points that resembles/ correlate with the original one. Imputation is desirable, as having more data to be fed to ML models means the model will have more context and thus better prediction results. From an imputation perspective, since the goal is that the imputed data faithfully relates with the original data, the result of correlation metric is desirable to show that the imputed data and original data are closely correlated. Herein, we present a novel imputation algorithm of Skewness Driven Distribution Imputation (SDDI), and evaluate its efficacy compared to multiple state-of-the-art methods including, K-Nearest Neighbors (KNN), Mean, Mode, Forward and Backward Fill (F&B-Fill) imputation methods. The comparison is done using accuracy metrics, Root Mean Square Error (RMSE), correlation and computation time. Furthermore, a correlation analysis is conducted on the subject 5G Vehicle-to-Everything (V2X) Quality of Service (QoS) dataset, aiming to enhance the understanding of parameter selection in assessing the QoS for 5G networks by providing the comparison regarding the significance in terms of correlation of various parameters in influencing 5G network's QoS. The state-of-the-art Pearson correlation method is used for said purpose. Moreover, we exploit an Extreme Gradient Boosting or XGBoost algorithm which is an ensemble of other techniques and not as complex as deep learning algorithms, to predict QoS, given certain conditions relevant to the 5G network. The comparative analysis of various imputation methods revealed average correlation values (as a measure of faithful data imputation) to be relatively close for Mean, Mode, F&B-Fill, SDDI, and K-Nearest Neighbors imputation methods at 0.161, 0.176, 0.143, 0.143, and 0.196, respectively. In terms of accuracy, all methods achieved high rates, with Mean and Mode at 93%, F&B-Fill at 90%, and both SDDI and KNN at 92%. Notably, in the second part of the research, when only the 15 most correlated features were used, we observed a substantial 60.5% reduction in the amount of data affected, with only a minimal impact of 3% on accuracy, achieving an impressive 93% accuracy. These results highlight the effectiveness of targeted feature selection of parameters for QoS of 5G networks and underscore the potential of our novel SDDI method in maintaining high data integrity while efficiently handling missing data, thereby enhancing the predictive reliability of the XGBoost algorithm.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Saifullah Khan, et al.
This work is licensed under a Creative Commons Attribution 4.0 International License.