Résumé:
This thesis aims to address a major challenge in cancer research, namely the identification
of the most relevant genes for cancer classification. To achieve this, a three-step approach
was adopted. Firstly, classification algorithms were applied directly to biochip datasets. Subsequently,
data quality was improved by applying preprocessing steps before reapplying the
classification algorithms. Finally, preprocessed data was further enhanced by selecting the most
relevant genes using selection techniques based on mutual information filtering, before reapplying
the same classification algorithms. The results of this study revealed that the support
vector machine algorithm achieved a classification rate of 100% with most of the databases
used after selecting the relevant genes. The neural network algorithm also showed good performance
in classifying cancer types.