Abstract:
Record Linkage (RL) is the process of identifying the records that refers to the same real-world entity. Several RL approaches were proposed in the literature but most of them were introduced without a bloc’s sizes controlling technic. In this thesis, we propose an enhanced K-Modes-based RL approach, in which a new bloc size mechanism is introduced as a post-step to blocking. The experiments that have been done on a real-world dataset show satisfying results where most of the duplicate records were detected.