Classification thématique des textes multilingue Etude de cas dans le domaine de sport

dc.contributor.author- ABERKANEAyoub
dc.contributor.authorATTIA Asm
dc.date.accessioned2025-11-11T12:59:17Z
dc.date.issued2025
dc.description.abstractWith the rise of digital development and the growing volume of textual content published daily, particularly in the sports domain, the need to organize such content has become increasingly important. This study aims to process multilingual sports texts using natural language processing and machine learning techniques, in order to classify them according to the topics they address. To standardize the linguistic processing of multilingual texts, the automatic translation model NLLB was used to translate the content into English, which contributed to improving the thematic segmentation of the texts. Several supervised algorithms were applied, including Naive Bayes, Support Vec tor Machine SVM,andMultilayer Perceptron MLP, on a sports dataset collected from the Kaggle platform. After data cleaning and converting the texts into numerical rep resentations using the TF-IDF algorithm, the models were trained and compared. Re sults showed that SVM and MLP achieved the best performance in terms of accuracy, while the Naive Bayes model stood out for its execution speed. This study demon strates the effectiveness of multilingual thematic classification in the sports domain and paves the way for future improvements using more advanced language models.
dc.identifier.issnMM/915
dc.identifier.urihttps://dspace.univ-bba.dz/handle/123456789/1015
dc.language.isofr
dc.publisheruniversity of bordj bou arreridj
dc.subjectnaturallanguage processing
dc.subjecttext classification
dc.subjectsport
dc.subjectTF-IDF
dc.subjectSVM
dc.subjectNaive Bayes
dc.subjectMLP
dc.subjectmBERT NLLB.
dc.titleClassification thématique des textes multilingue Etude de cas dans le domaine de sport
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
FIN_D_étude_ayoub_asma.pdf
Size:
2.18 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description: