Classification thématique des textes multilingue Etude de cas dans le domaine de sport

- ABERKANEAyoubATTIA Asm2025-11-112025MM/915https://dspace.univ-bba.dz/handle/123456789/1015With the rise of digital development and the growing volume of textual content published daily, particularly in the sports domain, the need to organize such content has become increasingly important. This study aims to process multilingual sports texts using natural language processing and machine learning techniques, in order to classify them according to the topics they address. To standardize the linguistic processing of multilingual texts, the automatic translation model NLLB was used to translate the content into English, which contributed to improving the thematic segmentation of the texts. Several supervised algorithms were applied, including Naive Bayes, Support Vec tor Machine SVM,andMultilayer Perceptron MLP, on a sports dataset collected from the Kaggle platform. After data cleaning and converting the texts into numerical rep resentations using the TF-IDF algorithm, the models were trained and compared. Re sults showed that SVM and MLP achieved the best performance in terms of accuracy, while the Naive Bayes model stood out for its execution speed. This study demon strates the effectiveness of multilingual thematic classification in the sports domain and paves the way for future improvements using more advanced language models.frnaturallanguage processingtext classificationsportTF-IDFSVMNaive BayesMLPmBERT NLLB.Classification thématique des textes multilingue Etude de cas dans le domaine de sportThesis