Using Multi-objective Meta-heuristics for Data Mining

Soumaia KAHLOUL

Using Multi-objective Meta-heuristics for Data Mining

Files

Primary KS Thesis.pdf (11.29 MB)

Date

2025-01-05

Authors

Soumaia KAHLOUL

Publisher

university of bordj bou arreridj

Abstract

The ability to extract knowledge from large datasets is essential for innovation and informed decision-making, a process known as knowledge extraction or data mining. Traditional methods often fall short in fully utilizing data potential, necessitating the development of new algorithms for better insights. This thesis explores an innovative approach by integrating deep learning with advanced feature selection techniques to improve the classification accuracy of COVID-19 cases from chest X-ray images. The dataset includes X-ray images categorized as COVID-19, pneumonia, and normal. We employ the Binary Multi-Objective Henry Gas Solubility Optimization Algorithm (B-MOHGSO) for feature selection and leverage models like AlexNet, VGG19, GoogleNet, and ResNet for feature extraction. Eight versions of B-MOHGSO were tested, with k-nearest neighbors (k-NN) as the classifier. The study highlights the significant impact of S-shaped and V-shaped transfer functions on binary transformations and classifier performance in high-dimensional medical imaging. Notably, B-MOHGSO algorithms, particularly those using V-shaped transfer functions, excelled in selecting relevant features while maintaining high accuracy. When combined with the VGG19 model and SVM classifier, B-MOHGSO significantly reduced the feature set without sacrificing performance. The application of B-MOHGSO in COVID-19 classification is crucial for identifying key features that enhance diagnostic processes and treatment strategies. By adapting MOHGSO for discrete optimization, this research aims to address the complexities of high-dimensional medical data and improve healthcare analytics outcomes.

Description

During this work, we demonstrated that implementing a new metaheuristic algorithm named MOHGSO for multi-objective problems can significantly enhance the efficiency and effectiveness of solving complex optimization tasks. Extending this algorithm to feature selection leverages its strengths to identify optimal feature subsets, leading to improved performance in various applications such as machine learning and data mining. These results suggest that MOHGSO-based feature selection is a promising approach for supervised classification tasks in data mining. The focal point of this thesis is the development and application of a new physical based algorithm, the Henry Gas Solubility Optimization (HGSO) algorithm, originally designed for continuous optimization problems. This thesis presents significant contribu tions to multi-objective optimization and feature selection in classification tasks through the exploration of physics-inspired optimization techniques in machine learning by intro ducing the binary version of HGSO based on gas solubility principles. Initially, we developed a population-based metaheuristic that simulates Henry’s law for solving multi-objective optimization problems (MOPs) before applying it to specific data mining challenges. The performance of our approach relies on two critical factors: managing multi-external archives when new non-dominated solutions are found and se lecting guiding individuals (multi-leaders) to direct their teammates toward promising regions, which is essential for balancing convergence (finding good solutions) and diver sity (exploring different search areas). The first part of this work focuses on exploring strategies for archive management and leader selection to enhance the performance and robustness of our proposed algo rithm. Through the development of MOHGSO, we introduced a novel multi-objective optimization algorithm based on Henry’s law that demonstrates effectiveness in continu ous optimization problems. The performance heavily depended on two crucial elements: 1. Management of multi-external archives for new non-dominated solutions; 146 2. Selection of guiding individuals (multi-leaders) to direct teammates toward promis ing regions. Comparative analyses within MOHGSO for MOPs—crowding distance with random selection versus grid mechanism with roulette wheel—revealed distinct strengths for each approach. For example, in ZDT functions, both strategies performed well, with crowding distance showing an advantage in complex variants like ZDT6, while the grid mechanism was more efficient for simpler ones like ZDT1. Similarly, DTLZ functions showed crowd ing distance’s benefits for highly multimodal problems such as DTLZ3, while the grid mechanism suited less complex variants like DTLZ1. The UF suite highlighted crowding distance’s robustness across varied complexities. A comparative analysis evaluated MOHGSO against MOGWO and MSSA algorithms across four engineering design problems: Four-bar truss design, Speed reducer design, Disk brake design, and Welded beam design. Findings indicate our proposed algorithm outperforms its competitors across all problems. The crowding distance with random selection strategy maintained diversity effectively in complex, high-dimensional problems but occasionally sacrificed convergence speed. Conversely, the grid mechanism with roulette wheel excelled in low-dimensional prob lems but struggled with high-dimensional variants due to increased computational costs. Comparative analysis of two strategies within MOHGSO revealed distinct strengths: 1. Crowding distance with random selection: • Demonstrated strong performance in maintaining diversity; • Excelled in complex, high-dimensional, or highly multimodal problems; • Showed flexibility across various problem types; • Occasionally sacrificed convergence speed for diversity. 2. Grid mechanism with roulette wheel: • Excelled in ensuring even distribution of solutions in low-dimensional problems; • Balanced diversity and convergence effectively for simpler MOPs; • Showed decreased efficiency as objectives increased due to computational de mands. To address discrete search space challenges, we adapted the HGSO algorithm for dis crete optimization and further optimized the b-HGSO algorithm for complex datasets in the medical field. We developed a Binary-Multi-Objective HGSO (B-MOHGSO), specifi cally designed for feature selection applied to COVID-19 classification tasks. The binary HGSO utilizes principles of gas solubility to tackle discrete optimization challenges effectively by balancing exploration and exploitation to achieve high classifi cation accuracy through optimized feature subsets. Eight binary versions of HGSO were 147 developed to evaluate their performance in selecting optimal feature subsets that enhance classification accuracy while minimizing features. The binary versions explored different transfer functions: 1. S-shaped transfer functions: • Provided smooth, gradual transitions between exploration and exploitation; • Facilitated a gradual shift from exploration to exploitation, leading to stable con vergence; • Resulted in incremental changes to feature subsets. 2. V-shaped transfer functions: • Offered abrupt transitions with sharp change points between exploration and ex ploitation; • Promoted more exploration at first, followed by rapid exploitation; • Led to faster convergence but with potentially higher instability or variability in feature subset changes. The rapid spread of COVID-19 necessitated efficient diagnostic tools in healthcare. Machine learning has emerged as a crucial technology for early detection and classification of COVID-19 cases. However, high dimensionality often hampers model performance. Feature selection plays a vital role in enhancing classifier accuracy by identifying relevant features from datasets. Our proposed method aims to improve classification accuracy while minimizing fea tures used in COVID-19 datasets. Experiments demonstrate that the best-performing b HGSO algorithm outperforms existing algorithms regarding efficiency and selecting fewer features while maintaining high classification accuracy. The second part of this work focuses on extending the Multi-Objective HGSO (MO HGSO) algorithm into a novel Binary-Multi-Objective HGSO (B-MOHGSO), designed specifically for discrete optimization problems like feature selection. This adaptation aims to improve B-MOHGSO’s ability to explore promising regions within search spaces while reducing population size and identifying diverse, high-quality feature subsets. B-MOHGSO incorporates eight different binary versions to investigate various transfer functions’ impact on performance in wrapper feature selection. A comprehensive study on S-shaped and V-shaped transfer functions reveals their effects on navigating discrete search spaces effectively, merged with state-of-the-art deep learning models and machine learning classifiers. This novel approach aims to enhance the accuracy and efficiency of COVID-19 diagnosis while addressing the complexities inherent in high-dimensional medical imaging data. The primary objectives of the second part of this study are: 148 • Develop and compare eight binary versions of the MOHGSO algorithm, exploring both S-shaped and V-shaped transfer functions for multi-objective feature selection; • Evaluate the performance of these algorithms when combined with different deep learning models (AlexNet, GoogleNet, ResNet, and VGG19) for feature extraction from chest X-ray images; • Assess the impact of different classifiers, specifically K-Nearest Neighbors (KNN) and Support Vector Machines (SVM), on the feature selection process and overall classification accuracy; • Determine the most effective combination of feature selection method, deep learning model, and classifier for accurate COVID-19 diagnosis. In the context of COVID-19 diagnosis, our research yielded several significant findings: 1. V-shaped transfer functions unexpectedly outperformed S-shaped functions: • Achieved better exploration; • Attained higher classification accuracy with fewer features; • This finding contradicted general assumptions about S-shaped functions. 2. Key factors influencing performance: • Dataset characteristics: the COVID-19 dataset may have unique properties that favor well with V-shaped functions’ behavior. • Feature correlation: a high correlation among features could make the sharp transitions of V-shaped functions more effective at identifying the most infor mative features. • Parameter settings: the specific parameter settings of B-MOHGSO might have been particularly favorable for V-shaped functions in this context. Finally, this work explores how different classifiers, such as K-Nearest Neighbors (K NN) and Support Vector Machines (SVM), influence the feature subsets selected by the Binary Multi-Objective Henry Gas Solubility Optimization (B-MOHGSO) algorithm, po tentially affecting our algorithm’s performance. These classifiers were chosen due to their established efficacy in handling classification tasks, particularly in medical imaging con texts. The KNN classifier, known for its simplicity and effectiveness in high-dimensional spaces, performed well with the selected feature subsets. However, the SVM classifier out performed KNN in this context, achieving higher accuracy and better generalization on unseen data. This highlights the importance of classifier selection in the feature selection process, as different classifiers may respond differently to the features identified by the MOHGSO algorithm. 149 The dual focus on using features extracted from the VGG19 model and V-shaped transfer function within the B-MOHGSO algorithm highlight the significant influence of classifier choice on feature selection process and overall algorithm performance. The results of the evaluation demonstrated that the proposed method significantly enhances classification performance for COVID-19 detection. Specifically, the study demonstrates that when the B-MOHGSO algorithm is paired with the SVM classifier, it outperforms the K-NN classifier in terms of classification accuracy and efficiency. SVM is better suited for this application compared to K-NN with high-dimensional data and noise present in medical imaging. Important and main findings from our research include: • The superiority of B-MOHGSO algorithms, particularly those employing V-shaped transfer functions, in selecting relevant features while maintaining high classification accuracy; • The V-shaped transfer function is particularly effective in guiding the optimiza tion process, allowing for a more decisive transition between feature inclusion and exclusion; • The exceptional performance of the VGG19 model when combined with B MOHGSO and the SVM classifier, achieving a remarkable accuracy while signifi cantly reducing the feature set. • The critical impact of choosing appropriate feature selection methods, transfer func tions, and classifiers when dealing with high-dimensional medical imaging data; • The crucial role of feature extraction, optimization feature selection algorithm, and classifier choice in enhancing diagnostic accuracy in healthcare; • Emphasizing the need for specialized optimization strategies that take into account the nature of the data and classifiers used. Finally, we acknowledge that there is extent for further research and improvement. Future research directions include: 1. Exploring MOHGSO feature selection with other machine learning algorithms be yond K-NN and SVM; 2. Comparing B-MOHGSO with other state-of-the-art feature selection algorithms; 3. Investigating parameter settings within MOHGSO for performance optimization; 4. Extending the approach to other medical imaging modalities and diseases; 5. Studying the relationship between dataset characteristics and transfer function se lection. 150 This work makes significant contributions to machine learning and optimization re search, particularly in developing effective feature selection techniques for improved data mining applications. The proposed B-MOHGSO algorithm aims to identify relevant fea tures efficiently, addressing high-dimensional medical imaging data’s complexities while enhancing diagnostic accuracy and efficiency in clinical settings. The successful appli cation to COVID-19 diagnosis demonstrates its potential impact on healthcare systems worldwide, offering promising tools for managing future health crises.

Keywords

Data mining, Feature selection, Multi-objective Henry Gas Solubility Op timizer (MOHGSO), COVID-19 classification, Deep learning, Transfer functions.

URI

https://dspace.univ-bba.dz/handle/123456789/21

Collections

Doctorat Recherche Opérationnelle

Full item page

Using Multi-objective Meta-heuristics for Data Mining

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By