Impact des techniques de prétraitement sur la performance des modèles de classification du diabète.
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
university of bordj bou arreridj
Abstract
Diabetes is a chronic disease for which early diagnosis is crucial to prevent serious com
plications. In this work, we study the impact of preprocessing techniques on the performance
of classification models applied to diabetes data. To this end, we use two medical datasets :
the Pima Indians dataset and a local dataset from Iraq. We evaluate three classification algo
rithms : logistic regression, support vector machines (SVM), and decision trees. We apply two
normalization techniques (MinMaxScaler and StandardScaler) and three feature selection me
thods (SelectKBest, GenericUnivariateSelect, SelectFromModel). The results, evaluated using
cross-validation, show that a well-chosen preprocessing strategy significantly improves model
accuracy, with varying performance depending on the nature of the data and the algorithm used.
Description
Keywords
Diabetes, Classification, Preprocessing, Feature Selection, Normalization, Cross Validation.