Proposition and evaluation of an entity linking system based on machine learning methods

Toukali, Sabar; Djaafri, Badre DDine

Proposition and evaluation of an entity linking system based on machine learning methods

dc.contributor.author	Toukali, Sabar
dc.contributor.author	Djaafri, Badre DDine
dc.date.accessioned	2022-01-03T08:53:22Z
dc.date.available	2022-01-03T08:53:22Z
dc.date.issued	2021
dc.description.abstract	Named Entity Linking is the task of linking an ambiguous entity mention to a corresponding entry in a knowledge base. Current methods have mostly focused on large unstructured text data to learn representations of entities. Harvesting entity from these large text collections and linking it, is a major challenge. Solutions to this entity linking problem, we propose an efficient linking method that uses machine learning technics and algorithms. We considered it as a binary classification problem, started with some principal features as mention and its candidate and other feature as mention context and candidate context to train the model and testing it. We use there ML method, one of them was Random forest which complete the task with a positive and satisfactory result. Keywords : Entity linking, Machine learning, knowledge base, Binary classification, Natural language processing II Résumé La liaison d’entité nommée est la tâche de lier une mention d’entité ambiguë à une entrée correspondante dans une base de connaissances. Les méthodes actuelles se sont principalement concentrées sur de grandes données textuelles non structurées pour apprendre des représentations d’entités. Récolter des entités à partir de ces grandes collections de textes et les relier est un défi majeur. Solutions à ce problème de liaison d’entités, nous proposons une méthode de liaison efficace qui utilise des techniques et des algorithmes d’apprentissage automatique. Nous l’avons considéré comme un problème de classification binaire, commencé avec certaines caractéristiques principales comme mention et son candidat et d’autres caractéristiques comme contexte de mention et contexte de candidat pour entraîner le modèle et le tester. Nous y utilisons des méthodes d’apprentissage automatique, l’une d’entre elles était la forêt aléatoire qui termine la tâche avec un résultat positif et satisfaisant. Mots clés : Entité Nommée, Apprentissage automatique, Base de connaissances, Classification binaire, Traitement du langage naturel.. III ملخص ربط الكيانات المسماة هي مهمة ربط كيان ما ذكر في نص مع الكيان الذي يوافقه في قاعدة المعرفة، الطرق الحالية ركزت على النصوص الكبيرة غير المنظمة لإستكشاف نمط لتمثيل هذه الكيانات. إستخراج هذه الكيانات من هذه النصوص الكبيرة وربطها تعد ت حديا كبيرا. كحل لهذه المشكلة، إقترحنا نهج ربط فعال يستعمل تقنيات وخوارزميات الذكاء الصناعي. فإعتبرنا المشكل وكأنه مشكل تصنيف ثنائي، بدأنا بتدريب النموذج بإستخدام الخصائص الأساسية كالكيان المذكور في النص ومثيله في قاعدة المعرفة، وقمنا بإختباره بعد ذلك. استخدمنا ثلاث خوارزميات للذكاء الصناعي من بينها الغابة العشوائية والتي أكملت المهمة بنتيجة إيجابية ومرضية. الكلمات المفتاحية: ربط الكيانات، الذكاء الصناعي، قاعدة المعرفة، التصنيف الثنائي، معالجة اللغة الطبيعية.	en_US
dc.identifier.issn	MM/643
dc.identifier.uri	http://10.10.1.6:4000/handle/123456789/1637
dc.language.iso	en	en_US
dc.publisher	Université Mohamed el-Bachir el-Ibrahimi Bordj Bou Arréridj Faculté de Mathématique et Informatique	en_US
dc.subject	Keywords : Entity linking, Machine learning, knowledge base, Binary classification, Natural language processing	en_US
dc.title	Proposition and evaluation of an entity linking system based on machine learning methods	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Final_Memoir.pdf
Size:: 1.76 MB
Format:: Adobe Portable Document Format
Description:: In this thesis, we presented two approaches used to solve named entity linking problem. The first one depends on probabilities and the second depends on machine learning algorithms. In chapter one, we have the entity linking problem on general and we talked about what is it, and what features are exploited to create linking systems. We particularized a part of the talk about familiar approaches that followed to handle the problem. In chapter two we presented four systems such as Cucerzan, UKP-UBC system. We discussed them and show their perspectives and results. In chapter three we build two searchable alias dictionary to generate referent Wikipedia articles for the given mentions. Each one has its structure, but they’re not very different from each other. We generated two datasets based on Wikipedia and KWDW dataset each one related with one dictionary. We utilized these two datasets separately and combined to train our models using logistic regression, decision tree and random forest. Our models achieve 94 % and 92 % on accuracy metric using the combined dataset when we add rows from KDWD dataset to our dataset that we build depending on Wikipedia article and Wikidata site. Those rows are from class ‘1’, and after adding them the dataset become balanced which made it easy to achieve this result. As a future work, we plan to apply more machine learning methods on our dataset and may achieve a higher result, furthermore we can dive into deep learning method such RNN. And we thought about adding more feature to the dataset which could constitute a high value feature and make the models more efficient. Because our native language (we’re talking about Arabic language) we plan to construct a similar dataset from Arabic Wikipedia Articles and apply those ML methods on them and monitor its effectiveness.

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

Master Informatique