Privacy preserving pattern mining from uncertain databases

Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

UNIVERSITY BBA

Abstract

With advancements in data analysis and processing techniques, the release of micro-data for research purposes, such as disease outbreak studies or economic pattern analysis, has become prevalent. However, these datasets, while valuable for researchers, often contain sensitive information that poses privacy risks to individuals. Privacy-preserving data mining (PPDM) has emerged as a critical field to address these concerns by concealing sensitive information while still enabling the extraction of useful insights. This task is NP-hard and involves the challenge of balancing the concealment of sensitive itemsets with the preservation of non-sensitive ones during data extraction. Numerous algorithms have been developed for deterministic databases, where information is binary (present or absent). This thesis explores a novel PPDM approach in the context of uncertain databases, where information is represented by probabilistic values. The sanitization process, aimed at hiding sensitive information, introduces side effects such as hiding failure, missing cost, artificial cost, and dissimilarity. These side effects are considered as objective functions to be minimized. To achieve the goal of PPDM, a Multi-Objective Optimization problem is formulated, and metaheuristic algorithms are employed. Specifically, the NSGA-II algorithm is applied, leading to the development of the NSGAII4ID algorithm for deterministic databases. This algorithm hides sensitive frequent itemsets by removing selected items from chosen transactions, representing a pioneering effort in privacy-preserving data mining. Furthermore, for uncertain databases, a novel algorithm named U-NSGAII4ID is proposed, addressing the multi-objective optimization problem by encoding a set of items to remove from selected transactions. Additionally, three heuristic approaches for PPDM in uncertain databases are introduced: the aggregate approach, which removes transactions; the disaggregate approach, which removes selected items from each transaction; and the hybrid approach, combining the two former approaches. Experimental evaluations compare these approaches, demonstrating their effectiveness in preserving sensitive itemsets in uncertain databases. *

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By