The Use of big data and data analytics in the prevention, Diagnosis and prediction of long term diseases.
Date
2026
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
university of bordj bou arreridj
Abstract
The increasing prevalence of long-term diseases, particularly diabetes, presents significant chal
lenges
to global healthcare systems. Early prediction, accurate diagnosis, and continuous mon
itoring
are crucial for improving patient outcomes and reducing healthcare costs. This thesis
explores the use of Big Data and data analytics in the prevention, diagnosis, and monitoring of
long-term diseases, focusing specifically on diabetes. The core objective is the development of
an integrated system that supports individuals throughout the disease lifecycle. The proposed
system is structured into three main phases: first, the creation of predictive algorithms capa
ble
of estimating an individual’s risk of developing diabetes within a ten-year period; second,
the application of explainable neural networks to diagnose diabetes based on retinal imaging,
ensuring transparency and trust in AI-driven decisions; and third, the development of a digital
platform to continuously monitor patients, facilitating proactive management and personalized
care. By leveraging machine learning, Big Data technologies, and explainable AI, this work
aims to contribute to a more predictive, preventive, and participatory healthcare model for
chronic disease management
Description
Machine learning (ML) and deep learning (DL) techniques have garnered significant attention
in medical diagnosis and healthcare due to their ability to analyze complex datasets, such as
medical images, clinical records, and genetic information. These approaches are particularly
effective in supporting healthcare professionals in the diagnosis and management of chronic
diseases like diabetes, where ML and DL can detect subtle patterns in medical imaging and
other data sources that may be challenging for humans to discern.
Diabetes presents unique challenges for healthcare systems due to the complexity of its
diagnosis, progression monitoring, and individualized management requirements. Traditionally,
diabetes diagnosis and monitoring rely on blood tests to measure glucose levels, HbA1c, and
other biomarkers. However, these methods are invasive, may require lab facilities, and can be
cumbersome for continuous monitoring. Therefore, there is a growing need for accurate, non
invasive,
and readily accessible diagnostic methods, particularly for early detection and risk
assessment. ML and DL techniques show promise in addressing these challenges by leveraging
data from imaging modalities such as retinal fundus photography, MRI, and CT scans to build
reliable diagnostic models.
Imaging data, especially retinal images, are widely used in diabetic diagnosis and moni
toring
because of their ability to reveal microvascular changes related to diabetic retinopathy,
one of the most common diabetes-related complications. ML and DL models trained on these
images can offer significant diagnostic support by detecting early indicators of diabetes and re
lated
complications, reducing the dependency on invasive testing. This thesis seeks to develop
a quick, accurate, and accessible diagnostic approach for diabetes by leveraging ML and DL
techniques with a focus on image-based analysis.
In line with these goals, a platforms was developed as part of this research to supportthe seamless monitoring and management of diabetes. The platform integrate ML and DL
diagnostic models into intuitive user interfaces, enabling healthcare professionals to upload and
analyze patient imaging data in real time. They include features for continuous monitoring,
automated risk stratification, and alert generation for critical findings. Designed with a focus
on clinical applicability, the platform support secure data handling, and role-based access for
healthcare staff. A mobile-compatible module was also implemented to enable remote screening
and monitoring, especially in resource-limited settings. These tools not only enhance diagnostic
efficiency but also facilitate timely interventions and follow-up by providing actionable insights
at the point of care.
The research questions guiding this thesis focus on (1) the feasibility of using ML-based
diagnostic systems to match or complement the performance of traditional glucose-monitoring
methods, (2) the necessity of DL methods in developing a robust diabetes diagnosis system
based on imaging data, (3) strategies for addressing class imbalance issues in available diabetes
datasets, particularly in large imaging datasets such as retinal images from diabetic and non
diabetic
patients, and (4) the development of platforms for deploying ML/DL models in clinical
settings with monitoring and decision-making capabilities. Specific objectives were set to
provide a thorough theoretical background in ML, DL, dimensionality reduction, and data
augmentation, followed by a detailed literature review of diabetes detection studies. A total
of 40 studies were categorized into ML-based, DL-based, and comparative analyses. This
analysis spanned approaches from ML to DL, feature extraction and selection techniques, and
data augmentation and class-balancing strategies. Key limitations identified in the review
included the need for high-quality, well-annotated imaging data, the lack of sufficient diabetic
samples in some datasets, and the limited use of sensitivity metrics, often overlooked in existing
studies, with values ranging from 73% to 81.2%.
Machine learning (ML) and deep learning (DL) are increasingly being used for diabetes
diagnosis, particularly through analyzing imaging data. These technologies can identify com
plex
patterns in medical images, such as retinal scans, that may be subtle or challenging for
human clinicians to discern. This capability enhances the accuracy of early diagnosis, sup
ports
disease progression monitoring, and aids in predicting potential complications associated
with diabetes. For instance, retinal fundus photography is commonly used to detect diabetic
retinopathy, while MRIs, CT scans, and foot thermography are employed to identify other
complications like neuropathy and cardiovascular risks.
In ML/DL applications for diabetes imaging, convolutional neural networks (CNNs) are
widely used due to their strength in feature extraction. These models are trained to recognize
visual cues such as microaneurysms or hemorrhages in retinal images, which signal diabetic
retinopathy. The accuracy of these models can be high, often rivaling the diagnostic capabilities
of trained professionals. However, the effectiveness of ML and DL models is heavily influence
by the quality of the data, the specific model architecture, and the type of complication beingaddressed. Real-world validation is essential to ensure these models perform well in diverse
clinical environments.
While ML and DL present innovative tools for diabetes diagnosis, they are best viewed as
complements to traditional methods, not replacements. These models can help streamline diag
nosis
and identify at-risk patients, but confirmation through standard tests, like blood glucose
measurements, remains crucial. There are challenges in developing these models, including the
need for high-quality, annotated images, handling class imbalances (where non-diabetic cases
often outnumber diabetic cases), and ensuring data privacy and security. Ethical consider
ations,
such as mitigating model bias and maintaining transparency in predictions, are also
paramount, especially in clinical settings.
Data privacy in ML/DL diabetes research is maintained through methods like data anonymiza
tion
and differential privacy, which help protect sensitive medical information. Looking ahead,
future advancements in data augmentation techniques could address class imbalance issues,
while hybrid models that integrate imaging data with clinical and genetic information may
enhance diagnostic precision. Explainability methods, which clarify how models reach their
conclusions, can build clinician trust. Real-time monitoring through remote imaging devices
and wearables could further transform diabetes care by enabling continuous, proactive man
agement.
These advancements hold great promise for enhancing diabetes diagnosis and management,
contributing to more personalized and timely healthcare interventions.
Keywords
DATA -DATA ANALYTICS -DIAGNOSIS