Mining the High Dimensional Biological Dataset Using Optimized Colossal Pattern with Dimensionality Reduction

Authors

  • T. Sreenivasula Reddy Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu 608002, India https://orcid.org/0000-0002-4887-2575
  • R. Sathya Assistant Professor, Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu-608002, India
  • Mallikhanjuna Rao Nuka Department of Computer Applications, Annamacharya Institute of Technology & Sciences Rajampet, YSR Kadapa, Andhra Pradesh-516115,India. 3mallikharjuna.nuka@gmail.com https://orcid.org/0000-0001-6510-5979

DOI:

https://doi.org/10.37256/cm.5120242460

Keywords:

colossal itemsets, frequent pattern mining, intuitionistic fuzzy rough set, diferential evolutionary arithmetic optimization algorithm, fruit fly, random search

Abstract

Recent years have seen a lot of attention paid to the mining of enormous item sets from high-dimensional databases. Small and mid-sized data sets take a long time to mine with traditional algorithms since they don’t include the complete and relevant info needed for decision making. Many applications, particularly in bioinformatics, benefit greatly from the extraction of (FCCI) Frequent Colossal Closed Itemsets from a large dataset. In order to extract FCCI from a dataset, present preprocessing strategies fail to remove all extraneous characteristics and rows from the data set completely. In addition, the most current algorithms for this kind are sequential and computationally expensive. A high-dimensional dataset is pruned of all extraneous characteristics and rows using two alternative dimensionality reduction strategies presented in this paper. Then, an optimal feature value is identified by using Equilibrium Optimizer (EO) to identify the threshold value for reduced features. It is designed to discover common items and build association rules if the feature value is smaller than the frequency mining algorithm (IFRS) in conjunction with the Fruit fly Algorithm (FFA). If the feature value exceeds the optimal threshold, then optimized Length restrictions can be used to solve the CP mining problem (LC). Random search is utilized to identify the optimal threshold values of the restrictions and extract the enormous pattern using the Differential Evolutionary Arithmetic Optimization Algorithm. The experiments are carried on twenty biological datasets that us extracted from UCI websites and validated the proposed models in terms of various metrics.

Downloads

Published

2024-03-08

How to Cite

1.
Sreenivasula Reddy T, Sathya R, Nuka MR. Mining the High Dimensional Biological Dataset Using Optimized Colossal Pattern with Dimensionality Reduction. Contemp. Math. [Internet]. 2024 Mar. 8 [cited 2024 May 14];5(1):645-64. Available from: https://ojs.wiserpub.com/index.php/CM/article/view/2460