Uniform Resource Locator Classification Using Classical Machine Learning & Deep Learning Techniques

Authors

DOI:

https://doi.org/10.37256/ccds.4120231847

Keywords:

URL classification, feature extraction, random forest, machine learning, deep learning

Abstract

In the Internet era, there is no doubt that the Internet has helped us in many ways by providing us with a means to communicate with anyone around the world. That is said, some people misuse such technology to conduct malicious behaviors. Many things could be exploited to perform such acts, but this work focuses on exploitation methods that use the uniform resource locator (URL). This paper presents the means to extract features from a raw URL. These are used to predict whether a URL is safe for a user to visit or not. The whole process of extracting the data and preparing it for a model is discussed thoroughly in this paper. Several machine learning (ML) models have been trained using different algorithms, including Catboost, RandomForest, and Decision trees, in addition to using and exploring several feedforward deep neural networks learning models. The best model achieved an accuracy of 95.61% on a test set using a deep learning model.

Downloads

Published

2022-10-31

How to Cite

1.
Rayyan A, Aburas MG, Al-Mousa A. Uniform Resource Locator Classification Using Classical Machine Learning & Deep Learning Techniques. Cloud Computing and Data Science [Internet]. 2022 Oct. 31 [cited 2024 Dec. 23];4(1):17-30. Available from: https://ojs.wiserpub.com/index.php/CCDS/article/view/1847