EXPLORATION,VISUALISATIONETAPPRENTISSAGE SUPERVISÉ SURLESDONNEESCOVID-19
Files
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
university of bordj bou arreridj
Abstract
This thesis presents a comprehensive application of data science to a real-world healthcare
case: analyzing and modeling clinical data related to COVID-19. Using a dataset of over 5,000
records and 100 variables, we followed the essential stages of a data science project.
The process began with thorough data preprocessing, including cleaning, encoding, handling
missing values, and validating the dataset. Then, we conducted detailed exploratory data
analysis to uncover distributions, relationships, and patterns.
The core of the project is supervised learning. We trained and evaluated several classification
algorithms (Random Forest, SVM, KNN, AdaBoost), using metrics such as accuracy, F1-score,
and ROC-AUC. This comparative analysis allowed us to select the most effective model for
predicting SARS-CoV-2 test outcomes.
Our work was carried out using modern tools like Python, Google Colab, and libraries such
as Scikit-learn, Pandas, and Seaborn. We also benefited from educational content like Machine
Learnia’s tutorials to enhance our methodology.
In conclusion, this thesis demonstrates the power of data science in healthcare, while also
highlighting the technical, ethical, and operational challenges of integrating artificial
intelligence into medical decision-making systems.