Using Multi-objective Meta-heuristics for Data Mining
Date
2025-01-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
university of bordj bou arreridj
Abstract
The ability to extract knowledge from large datasets is essential for innovation and
informed decision-making, a process known as knowledge extraction or data mining.
Traditional methods often fall short in fully utilizing data potential, necessitating the
development of new algorithms for better insights.
This thesis explores an innovative approach by integrating deep learning with
advanced feature selection techniques to improve the classification accuracy of COVID-19
cases from chest X-ray images. The dataset includes X-ray images categorized as
COVID-19, pneumonia, and normal. We employ the Binary Multi-Objective Henry
Gas Solubility Optimization Algorithm (B-MOHGSO) for feature selection and leverage
models like AlexNet, VGG19, GoogleNet, and ResNet for feature extraction. Eight
versions of B-MOHGSO were tested, with k-nearest neighbors (k-NN) as the classifier.
The study highlights the significant impact of S-shaped and V-shaped transfer functions
on binary transformations and classifier performance in high-dimensional medical
imaging. Notably, B-MOHGSO algorithms, particularly those using V-shaped transfer
functions, excelled in selecting relevant features while maintaining high accuracy. When
combined with the VGG19 model and SVM classifier, B-MOHGSO significantly reduced
the feature set without sacrificing performance.
The application of B-MOHGSO in COVID-19 classification is crucial for identifying
key features that enhance diagnostic processes and treatment strategies. By adapting
MOHGSO for discrete optimization, this research aims to address the complexities of
high-dimensional medical data and improve healthcare analytics outcomes.
Description
During this work, we demonstrated that implementing a new metaheuristic algorithm
named MOHGSO for multi-objective problems can significantly enhance the efficiency
and effectiveness of solving complex optimization tasks. Extending this algorithm to
feature selection leverages its strengths to identify optimal feature subsets, leading to
improved performance in various applications such as machine learning and data mining.
These results suggest that MOHGSO-based feature selection is a promising approach for
supervised classification tasks in data mining.
The focal point of this thesis is the development and application of a new physical based algorithm, the Henry Gas Solubility Optimization (HGSO) algorithm, originally
designed for continuous optimization problems. This thesis presents significant contribu tions to multi-objective optimization and feature selection in classification tasks through
the exploration of physics-inspired optimization techniques in machine learning by intro ducing the binary version of HGSO based on gas solubility principles.
Initially, we developed a population-based metaheuristic that simulates Henry’s law
for solving multi-objective optimization problems (MOPs) before applying it to specific
data mining challenges. The performance of our approach relies on two critical factors:
managing multi-external archives when new non-dominated solutions are found and se lecting guiding individuals (multi-leaders) to direct their teammates toward promising
regions, which is essential for balancing convergence (finding good solutions) and diver sity (exploring different search areas).
The first part of this work focuses on exploring strategies for archive management
and leader selection to enhance the performance and robustness of our proposed algo rithm. Through the development of MOHGSO, we introduced a novel multi-objective
optimization algorithm based on Henry’s law that demonstrates effectiveness in continu ous optimization problems. The performance heavily depended on two crucial elements:
1. Management of multi-external archives for new non-dominated solutions;
146
2. Selection of guiding individuals (multi-leaders) to direct teammates toward promis ing regions.
Comparative analyses within MOHGSO for MOPs—crowding distance with random
selection versus grid mechanism with roulette wheel—revealed distinct strengths for each
approach. For example, in ZDT functions, both strategies performed well, with crowding
distance showing an advantage in complex variants like ZDT6, while the grid mechanism
was more efficient for simpler ones like ZDT1. Similarly, DTLZ functions showed crowd ing distance’s benefits for highly multimodal problems such as DTLZ3, while the grid
mechanism suited less complex variants like DTLZ1. The UF suite highlighted crowding
distance’s robustness across varied complexities.
A comparative analysis evaluated MOHGSO against MOGWO and MSSA algorithms
across four engineering design problems: Four-bar truss design, Speed reducer design,
Disk brake design, and Welded beam design. Findings indicate our proposed algorithm
outperforms its competitors across all problems.
The crowding distance with random selection strategy maintained diversity effectively
in complex, high-dimensional problems but occasionally sacrificed convergence speed.
Conversely, the grid mechanism with roulette wheel excelled in low-dimensional prob lems but struggled with high-dimensional variants due to increased computational costs.
Comparative analysis of two strategies within MOHGSO revealed distinct strengths:
1. Crowding distance with random selection:
• Demonstrated strong performance in maintaining diversity;
• Excelled in complex, high-dimensional, or highly multimodal problems;
• Showed flexibility across various problem types;
• Occasionally sacrificed convergence speed for diversity.
2. Grid mechanism with roulette wheel:
• Excelled in ensuring even distribution of solutions in low-dimensional problems;
• Balanced diversity and convergence effectively for simpler MOPs;
• Showed decreased efficiency as objectives increased due to computational de mands.
To address discrete search space challenges, we adapted the HGSO algorithm for dis crete optimization and further optimized the b-HGSO algorithm for complex datasets in
the medical field. We developed a Binary-Multi-Objective HGSO (B-MOHGSO), specifi cally designed for feature selection applied to COVID-19 classification tasks.
The binary HGSO utilizes principles of gas solubility to tackle discrete optimization
challenges effectively by balancing exploration and exploitation to achieve high classifi cation accuracy through optimized feature subsets. Eight binary versions of HGSO were
147
developed to evaluate their performance in selecting optimal feature subsets that enhance
classification accuracy while minimizing features. The binary versions explored different
transfer functions:
1. S-shaped transfer functions:
• Provided smooth, gradual transitions between exploration and exploitation;
• Facilitated a gradual shift from exploration to exploitation, leading to stable con vergence;
• Resulted in incremental changes to feature subsets.
2. V-shaped transfer functions:
• Offered abrupt transitions with sharp change points between exploration and ex ploitation;
• Promoted more exploration at first, followed by rapid exploitation;
• Led to faster convergence but with potentially higher instability or variability in
feature subset changes.
The rapid spread of COVID-19 necessitated efficient diagnostic tools in healthcare.
Machine learning has emerged as a crucial technology for early detection and classification
of COVID-19 cases. However, high dimensionality often hampers model performance.
Feature selection plays a vital role in enhancing classifier accuracy by identifying relevant
features from datasets.
Our proposed method aims to improve classification accuracy while minimizing fea tures used in COVID-19 datasets. Experiments demonstrate that the best-performing b HGSO algorithm outperforms existing algorithms regarding efficiency and selecting fewer
features while maintaining high classification accuracy.
The second part of this work focuses on extending the Multi-Objective HGSO (MO HGSO) algorithm into a novel Binary-Multi-Objective HGSO (B-MOHGSO), designed
specifically for discrete optimization problems like feature selection. This adaptation
aims to improve B-MOHGSO’s ability to explore promising regions within search spaces
while reducing population size and identifying diverse, high-quality feature subsets.
B-MOHGSO incorporates eight different binary versions to investigate various transfer
functions’ impact on performance in wrapper feature selection. A comprehensive study
on S-shaped and V-shaped transfer functions reveals their effects on navigating discrete
search spaces effectively, merged with state-of-the-art deep learning models and machine
learning classifiers. This novel approach aims to enhance the accuracy and efficiency
of COVID-19 diagnosis while addressing the complexities inherent in high-dimensional
medical imaging data. The primary objectives of the second part of this study are:
148
• Develop and compare eight binary versions of the MOHGSO algorithm, exploring
both S-shaped and V-shaped transfer functions for multi-objective feature selection;
• Evaluate the performance of these algorithms when combined with different deep
learning models (AlexNet, GoogleNet, ResNet, and VGG19) for feature extraction
from chest X-ray images;
• Assess the impact of different classifiers, specifically K-Nearest Neighbors (KNN)
and Support Vector Machines (SVM), on the feature selection process and overall
classification accuracy;
• Determine the most effective combination of feature selection method, deep learning
model, and classifier for accurate COVID-19 diagnosis.
In the context of COVID-19 diagnosis, our research yielded several significant findings:
1. V-shaped transfer functions unexpectedly outperformed S-shaped functions:
• Achieved better exploration;
• Attained higher classification accuracy with fewer features;
• This finding contradicted general assumptions about S-shaped functions.
2. Key factors influencing performance:
• Dataset characteristics: the COVID-19 dataset may have unique properties
that favor well with V-shaped functions’ behavior.
• Feature correlation: a high correlation among features could make the sharp
transitions of V-shaped functions more effective at identifying the most infor mative features.
• Parameter settings: the specific parameter settings of B-MOHGSO might have
been particularly favorable for V-shaped functions in this context.
Finally, this work explores how different classifiers, such as K-Nearest Neighbors (K NN) and Support Vector Machines (SVM), influence the feature subsets selected by the
Binary Multi-Objective Henry Gas Solubility Optimization (B-MOHGSO) algorithm, po tentially affecting our algorithm’s performance. These classifiers were chosen due to their
established efficacy in handling classification tasks, particularly in medical imaging con texts. The KNN classifier, known for its simplicity and effectiveness in high-dimensional
spaces, performed well with the selected feature subsets. However, the SVM classifier out performed KNN in this context, achieving higher accuracy and better generalization on
unseen data. This highlights the importance of classifier selection in the feature selection
process, as different classifiers may respond differently to the features identified by the
MOHGSO algorithm.
149
The dual focus on using features extracted from the VGG19 model and V-shaped
transfer function within the B-MOHGSO algorithm highlight the significant influence
of classifier choice on feature selection process and overall algorithm performance. The
results of the evaluation demonstrated that the proposed method significantly enhances
classification performance for COVID-19 detection. Specifically, the study demonstrates
that when the B-MOHGSO algorithm is paired with the SVM classifier, it outperforms
the K-NN classifier in terms of classification accuracy and efficiency. SVM is better suited
for this application compared to K-NN with high-dimensional data and noise present in
medical imaging. Important and main findings from our research include:
• The superiority of B-MOHGSO algorithms, particularly those employing V-shaped
transfer functions, in selecting relevant features while maintaining high classification
accuracy;
• The V-shaped transfer function is particularly effective in guiding the optimiza tion process, allowing for a more decisive transition between feature inclusion and
exclusion;
• The exceptional performance of the VGG19 model when combined with B MOHGSO and the SVM classifier, achieving a remarkable accuracy while signifi cantly reducing the feature set.
• The critical impact of choosing appropriate feature selection methods, transfer func tions, and classifiers when dealing with high-dimensional medical imaging data;
• The crucial role of feature extraction, optimization feature selection algorithm, and
classifier choice in enhancing diagnostic accuracy in healthcare;
• Emphasizing the need for specialized optimization strategies that take into account
the nature of the data and classifiers used.
Finally, we acknowledge that there is extent for further research and improvement.
Future research directions include:
1. Exploring MOHGSO feature selection with other machine learning algorithms be yond K-NN and SVM;
2. Comparing B-MOHGSO with other state-of-the-art feature selection algorithms;
3. Investigating parameter settings within MOHGSO for performance optimization;
4. Extending the approach to other medical imaging modalities and diseases;
5. Studying the relationship between dataset characteristics and transfer function se lection.
150
This work makes significant contributions to machine learning and optimization re search, particularly in developing effective feature selection techniques for improved data
mining applications. The proposed B-MOHGSO algorithm aims to identify relevant fea tures efficiently, addressing high-dimensional medical imaging data’s complexities while
enhancing diagnostic accuracy and efficiency in clinical settings. The successful appli cation to COVID-19 diagnosis demonstrates its potential impact on healthcare systems
worldwide, offering promising tools for managing future health crises.
Keywords
Data mining, Feature selection, Multi-objective Henry Gas Solubility Op timizer (MOHGSO), COVID-19 classification, Deep learning, Transfer functions.