Proposition and evaluation of an entity linking system based on machine learning methods

dc.contributor.authorToukali, Sabar
dc.contributor.authorDjaafri, Badre DDine
dc.date.accessioned2022-01-03T08:53:22Z
dc.date.available2022-01-03T08:53:22Z
dc.date.issued2021
dc.description.abstractNamed Entity Linking is the task of linking an ambiguous entity mention to a corresponding entry in a knowledge base. Current methods have mostly focused on large unstructured text data to learn representations of entities. Harvesting entity from these large text collections and linking it, is a major challenge. Solutions to this entity linking problem, we propose an efficient linking method that uses machine learning technics and algorithms. We considered it as a binary classification problem, started with some principal features as mention and its candidate and other feature as mention context and candidate context to train the model and testing it. We use there ML method, one of them was Random forest which complete the task with a positive and satisfactory result. Keywords : Entity linking, Machine learning, knowledge base, Binary classification, Natural language processing II Résumé La liaison d’entité nommée est la tâche de lier une mention d’entité ambiguë à une entrée correspondante dans une base de connaissances. Les méthodes actuelles se sont principalement concentrées sur de grandes données textuelles non structurées pour apprendre des représentations d’entités. Récolter des entités à partir de ces grandes collections de textes et les relier est un défi majeur. Solutions à ce problème de liaison d’entités, nous proposons une méthode de liaison efficace qui utilise des techniques et des algorithmes d’apprentissage automatique. Nous l’avons considéré comme un problème de classification binaire, commencé avec certaines caractéristiques principales comme mention et son candidat et d’autres caractéristiques comme contexte de mention et contexte de candidat pour entraîner le modèle et le tester. Nous y utilisons des méthodes d’apprentissage automatique, l’une d’entre elles était la forêt aléatoire qui termine la tâche avec un résultat positif et satisfaisant. Mots clés : Entité Nommée, Apprentissage automatique, Base de connaissances, Classification binaire, Traitement du langage naturel.. III ملخص ربط الكيانات المسماة هي مهمة ربط كيان ما ذكر في نص مع الكيان الذي يوافقه في قاعدة المعرفة، الطرق الحالية ركزت على النصوص الكبيرة غير المنظمة لإستكشاف نمط لتمثيل هذه الكيانات. إستخراج هذه الكيانات من هذه النصوص الكبيرة وربطها تعد ت حديا كبيرا. كحل لهذه المشكلة، إقترحنا نهج ربط فعال يستعمل تقنيات وخوارزميات الذكاء الصناعي. فإعتبرنا المشكل وكأنه مشكل تصنيف ثنائي، بدأنا بتدريب النموذج بإستخدام الخصائص الأساسية كالكيان المذكور في النص ومثيله في قاعدة المعرفة، وقمنا بإختباره بعد ذلك. استخدمنا ثلاث خوارزميات للذكاء الصناعي من بينها الغابة العشوائية والتي أكملت المهمة بنتيجة إيجابية ومرضية. الكلمات المفتاحية: ربط الكيانات، الذكاء الصناعي، قاعدة المعرفة، التصنيف الثنائي، معالجة اللغة الطبيعية.en_US
dc.identifier.issnMM/643
dc.identifier.urihttp://10.10.1.6:4000/handle/123456789/1637
dc.language.isoenen_US
dc.publisherUniversité Mohamed el-Bachir el-Ibrahimi Bordj Bou Arréridj Faculté de Mathématique et Informatiqueen_US
dc.subjectKeywords : Entity linking, Machine learning, knowledge base, Binary classification, Natural language processingen_US
dc.titleProposition and evaluation of an entity linking system based on machine learning methodsen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
Final_Memoir.pdf
Size:
1.76 MB
Format:
Adobe Portable Document Format
Description:
In this thesis, we presented two approaches used to solve named entity linking problem. The first one depends on probabilities and the second depends on machine learning algorithms. In chapter one, we have the entity linking problem on general and we talked about what is it, and what features are exploited to create linking systems. We particularized a part of the talk about familiar approaches that followed to handle the problem. In chapter two we presented four systems such as Cucerzan, UKP-UBC system. We discussed them and show their perspectives and results. In chapter three we build two searchable alias dictionary to generate referent Wikipedia articles for the given mentions. Each one has its structure, but they’re not very different from each other. We generated two datasets based on Wikipedia and KWDW dataset each one related with one dictionary. We utilized these two datasets separately and combined to train our models using logistic regression, decision tree and random forest. Our models achieve 94 % and 92 % on accuracy metric using the combined dataset when we add rows from KDWD dataset to our dataset that we build depending on Wikipedia article and Wikidata site. Those rows are from class ‘1’, and after adding them the dataset become balanced which made it easy to achieve this result. As a future work, we plan to apply more machine learning methods on our dataset and may achieve a higher result, furthermore we can dive into deep learning method such RNN. And we thought about adding more feature to the dataset which could constitute a high value feature and make the models more efficient. Because our native language (we’re talking about Arabic language) we plan to construct a similar dataset from Arabic Wikipedia Articles and apply those ML methods on them and monitor its effectiveness.

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: