Uniform Resource Locator Classification Using Classical Machine Learning & Deep Learning Techniques
DOI:
https://doi.org/10.37256/ccds.4120231847Keywords:
URL classification, feature extraction, random forest, machine learning, deep learningAbstract
In the Internet era, there is no doubt that the Internet has helped us in many ways by providing us with a means to communicate with anyone around the world. That is said, some people misuse such technology to conduct malicious behaviors. Many things could be exploited to perform such acts, but this work focuses on exploitation methods that use the uniform resource locator (URL). This paper presents the means to extract features from a raw URL. These are used to predict whether a URL is safe for a user to visit or not. The whole process of extracting the data and preparing it for a model is discussed thoroughly in this paper. Several machine learning (ML) models have been trained using different algorithms, including Catboost, RandomForest, and Decision trees, in addition to using and exploring several feedforward deep neural networks learning models. The best model achieved an accuracy of 95.61% on a test set using a deep learning model.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Aws Rayyan, Mohammad Ghassan Aburas, Amjed Al-Mousa
This work is licensed under a Creative Commons Attribution 4.0 International License.