Fouille d’épisodes à partir de données incertaines (Episode mining from uncertain data)
dc.contributor.author | Ouarem, Oualid | |
dc.date.accessioned | 2024-06-20T11:24:04Z | |
dc.date.available | 2024-06-20T11:24:04Z | |
dc.date.issued | 2024-06 | |
dc.description.abstract | Data mining is a critical process in the discovery of knowledge from data. Its primary objective is to extract interesting patterns that implicitly indicate significant relationships between items. Different branches of data mining manipulate various types of data. Episode mining is a subfield of data mining that aims to uncover valuable knowledge from temporal data in the form of a single, long sequence of events. The sequence may not always certain data; it may be noisy, sourced from multiple sources, or collected with errors. Consequently, there is a need to develop and design algorithms to extract frequent episodes from uncertain data. This thesis proposes novel algorithms for frequent episode and episode rule mining in the case of certain data and addresses also the challenges associated with these tasks in the context of uncertain | en_US |
dc.identifier.issn | MD/23 | |
dc.identifier.uri | http://10.10.1.6:4000/handle/123456789/5034 | |
dc.language.iso | en | en_US |
dc.publisher | UNIVERSITY BBA | en_US |
dc.subject | Episode mining, episode rules, NONEPI, EMDO, UEMDO, prediction, uncertaindata | en_US |
dc.title | Fouille d’épisodes à partir de données incertaines (Episode mining from uncertain data) | en_US |
dc.type | Thesis | en_US |
Files
Original bundle
1 - 1 of 1
- Name:
- La thèse.pdf
- Size:
- 900.14 KB
- Format:
- Adobe Portable Document Format
- Description:
- Data-mining is a large field that aims to discover interesting insights from a large volume of data. This thesis brings new contributions to an important sub-field of data mining that aims at extracting important patterns from data. We have focused particularly on extracting frequent episodes from large sequences of events. Frequent episode mining framework was designed to analyze a large sequence of events. It is an active area of research in recent decades, as evidenced by numerous published studies. It helps to analyze temporal data, understand the behavior of systems, detect abnormalities, and predict the future. In this thesis, we have presented an up-to-date state-of-the-art of this framework and its variants, as well as novel approaches for certain and uncertain data that are summarized in what follows: • We proposed a novel approach for episode and episode rule mining from certain sequences of events. It includes two algorithms: The first one mines frequent serial episodes under non-overlapping occurrence based frequency from an event sequence. The second algorithm uses the first one to mine episode rules based on a depth-first search strategy. The algorithm yielded good results compared to existing methods. The previous contribution is applicable for prediction tasks in several domains, such as medicine and stock market buy/sell actions, because of its rules form that can describe the prefix extensibility of sub-sequences. • Because multiple frequency definitions exist in episode mining, we have proposed novel 119 algorithms for frequent episode and episode rule discovery under the distinct occurrencebased frequency called EMDO and EMDO-P, respectively. This frequency definition may detect episodes that may overlap without sharing common timestamps. This approach captures episodes when the order of a given episode is not considered. The proposed rules are very useful for analyzing the relationships between a wide range of pairs of episodes and include the rules proposed in the first contribution. For example, it can be applied to analyze the click logs of any website to obtain a set of rules that help users recommend the next pages to visit in order to enhance the performance of such a website. • To address data uncertainty, some proposed methods calculate the existential probability of episodes in an event sequence. However, few studies deal with event probabilities. As a novel contribution, we proposed an extension of EMDO and EMDO-P to existential probability episodes in sequence with uncertain events. The new extension calculates the expected support for an episode based on the probability of its distinct occurrences. Research opportunities There are several opportunities for research on episode mining. Some of the studies mentioned in this thesis include: • Enhancing the performance of existing episode mining algorithms, Many algorithms are still resource-intensive, especially when dealing with long or intricate event sequences. Furthermore, most algorithms developed for episode mining have only been applied in centralized systems. Therefore, creating algorithms that operate effectively on distributed systems or developing parallel mining algorithms to improve speed and scalability is a significant challenge in the field of episode mining. • Extending the existing algorithms to consider more complex types of episodes and episode rules, Many existing algorithms primarily focus on identifying a single type of episodes and/or episode rules. However, numerous applications involve numerous simultaneous events. Recent research has begun to address this issue, such as in medical applications [59]. Nevertheless, this area of study requires further investigation. Furthermore, there exist a few work that consider episodes with partial order, which really correspond to the reality of many systems. 120 • Finding more applications that depend on episode mining approach, Most current research focuses on developing algorithms based on theoretical concepts rather than practical application. As a result, the process of analyzing data using temporal analysis with episodes and episode rule discovery is an understudied topic. To address this gap, it is recommended to use episode mining to analyze any recorded data, particularly in fields such as cybersecurity, medicine, economics, and finance, where the order of elements is critical. • Another very important future research area is the find extensions of existing state-of-theart methods to incorporate techniques of deep learning which is is today a powerful and trending field in Artificial Intelligence.
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed to upon submission
- Description: