Classification thématique des textes  multilingue  Etude de cas dans le domaine de sport

- ABERKANEAyoub; ATTIA Asm

Classification thématique des textes multilingue Etude de cas dans le domaine de sport

Files

FIN_D_étude_ayoub_asma.pdf (2.18 MB)

Date

2025

Authors

- ABERKANEAyoub

ATTIA Asm

Publisher

university of bordj bou arreridj

Abstract

With the rise of digital development and the growing volume of textual content published daily, particularly in the sports domain, the need to organize such content has become increasingly important. This study aims to process multilingual sports texts using natural language processing and machine learning techniques, in order to classify them according to the topics they address. To standardize the linguistic processing of multilingual texts, the automatic translation model NLLB was used to translate the content into English, which contributed to improving the thematic segmentation of the texts. Several supervised algorithms were applied, including Naive Bayes, Support Vec tor Machine SVM,andMultilayer Perceptron MLP, on a sports dataset collected from the Kaggle platform. After data cleaning and converting the texts into numerical rep resentations using the TF-IDF algorithm, the models were trained and compared. Re sults showed that SVM and MLP achieved the best performance in terms of accuracy, while the Naive Bayes model stood out for its execution speed. This study demon strates the effectiveness of multilingual thematic classification in the sports domain and paves the way for future improvements using more advanced language models.

Keywords

naturallanguage processing, text classification, sport, TF-IDF, SVM, Naive Bayes, MLP, mBERT NLLB.

URI

https://dspace.univ-bba.dz/handle/123456789/1015

Collections

Master Informatique

Full item page

Classification thématique des textes multilingue Etude de cas dans le domaine de sport

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By