Architecture Design and implementation of the startup Farmy.ai's data pipeline.

Ouaret, Sami

→
Faculté des mathématiques et de l'informatique
→
Master Informatique
→
Voir le document

dc.contributor.author	Ouaret, Sami
dc.date.accessioned	2022-01-02T09:16:53Z
dc.date.available	2022-01-02T09:16:53Z
dc.date.issued	2021
dc.identifier.issn	MM/ 648
dc.identifier.uri	https://dspace.univ-bba.dz:443/xmlui/handle/123456789/1618
dc.description.abstract	In this data-driven world, companies need to work on their data strategy to survive and stay competitive. Most existing data strategies depend on manual labor and minimize human creativity in problem solving and value creation. Because of this, it takes a long time to utilize data resources, making the insights gained obsolete. At Farmy.ai Startup, after a year and a half in production, we understood that implementing an automated data pipeline is imperative to accelerate the use of our data resources while scaling to new use cases. This project aims to design a data architecture and implement a scalable data pipeline for the startup Farmy. This data pipeline is intended to enable image-based diagnosis of plant diseases. The implemented pipeline starts by regularly retrieving images and their metadata from social media and other sources. It then stores and catalogs the collected data in a cloud data lake. Then, it enriches the stored data with annotations from agriculture experts. Finally, we orchestrate all the operations of this data pipeline to avoid repetitive manual work. مع التطور التكنولوجي السريع يشكل الذكاء الإصطناعي والبيانات الضخمة القاطرة التي تقودنا لثورة من الحلول المبتكرة, لذلك تسعى المؤسسات جاهدة على نطاق واسع لدمج البيانات في أعمالها لتحسين منتجاتها وتجاوز منافسيها. تعد الزراعة تحديا جذابا للذكاء الإصطناعي و البيانات الضخمة, فارمي شركة ناشئة جزائرية تهدف لاستعمال الذكاء الإصناعي لتمكين الفلاحين من الحصول على تشخيصا سريعا و موثوقا لأمراض المحاصيل التي تقضي على جزء كبير منها سنويا. نظرا لكون البيانات المحرك الأساسي للذكاء الإصطناعي, فارمي واجهت تحديات لدمج البيانات من مصادرها المختلفة حتى تتمكن من تطبيق الذكاء الاصطناعي, ولكون الكثير من البيانات تكون تالفة و بلا صلة لما تحتاجه فارمي, يجب تطهير وتصفية تلك البيانات, بعد ذلك حتى تكون البيانات ذات صلة جاهزة للإستعمال والتطبيق يجب إثرائها بمعلومات إضافية من طرف خبراء زراعيين. من الواضح أن القيام بكل هذه العمليات يتطلب جهدا, لهذا قررت فارمي بناء نظام سلس يسمح لها بإدارة هاته العمليات دون عناء. يهدف هذا العمل إلى عرض تصميم يستفيد من بنية بحيرة البيانات والحوسبة السحابية لبناء خط أنابيب قوي وموثوق لتجميع ومعالجة بيانات فارمي لتمكينها من تطبيق الذكاء الإصطناعي والقيام بأعمالها. Dans ce monde ax e sur les donn ees, les entreprises doivent travailler sur leur strat egie de donn ees pour survivre et rester comp etitives. N eanmoins, la plupart des strat egies de donn ees existantes requiert un e ort manuel consid erable et ne cr ee pas la valeur attendue. La startup Farmy, proposant des solutions bas ees sur l'intelligence, a rapidement r ealis e l'importance d'une architecture de donn ee robuste. Un pipeline de donn ees automatis e permet egalement d'acc el erer le d eveloppement de nouvelles solutions. Le cadre de ce projet concerne la conception d'une architecture de donn ees et l'impl ementation d'un pipeline de donn ees destin ees a la startup Farmy. Ce pipeline de donn ees est appliqu e au diagnostic automatis e de maladies des plantes. La premi ere etape consiste a r ecup erer p eriodiquement des images et leurs m etadonn ees a depuis les r eseaux sociaux et d'autres sources. Par la suite, ces donn ees collect ees sont stock ees et catalogu ees dans un lac de donn ees dans le cloud. En n, ces donn ees sont enrichies par des annotations r ealis ees par des experts agricoles.	en_US
dc.description.abstract	In this data-driven world, companies need to work on their data strategy to survive and stay competitive. Most existing data strategies depend on manual labor and minimize human creativity in problem solving and value creation. Because of this, it takes a long time to utilize data resources, making the insights gained obsolete. At Farmy.ai Startup, after a year and a half in production, we understood that implementing an automated data pipeline is imperative to accelerate the use of our data resources while scaling to new use cases. This project aims to design a data architecture and implement a scalable data pipeline for the startup Farmy. This data pipeline is intended to enable image-based diagnosis of plant diseases. The implemented pipeline starts by regularly retrieving images and their metadata from social media and other sources. It then stores and catalogs the collected data in a cloud data lake. Then, it enriches the stored data with annotations from agriculture experts. Finally, we orchestrate all the operations of this data pipeline to avoid repetitive manual work. مع التطور التكنولوجي السريع يشكل الذكاء الإصطناعي والبيانات الضخمة القاطرة التي تقودنا لثورة من الحلول المبتكرة, لذلك تسعى المؤسسات جاهدة على نطاق واسع لدمج البيانات في أعمالها لتحسين منتجاتها وتجاوز منافسيها. تعد الزراعة تحديا جذابا للذكاء الإصطناعي و البيانات الضخمة, فارمي شركة ناشئة جزائرية تهدف لاستعمال الذكاء الإصناعي لتمكين الفلاحين من الحصول على تشخيصا سريعا و موثوقا لأمراض المحاصيل التي تقضي على جزء كبير منها سنويا. نظرا لكون البيانات المحرك الأساسي للذكاء الإصطناعي, فارمي واجهت تحديات لدمج البيانات من مصادرها المختلفة حتى تتمكن من تطبيق الذكاء الاصطناعي, ولكون الكثير من البيانات تكون تالفة و بلا صلة لما تحتاجه فارمي, يجب تطهير وتصفية تلك البيانات, بعد ذلك حتى تكون البيانات ذات صلة جاهزة للإستعمال والتطبيق يجب إثرائها بمعلومات إضافية من طرف خبراء زراعيين. من الواضح أن القيام بكل هذه العمليات يتطلب جهدا, لهذا قررت فارمي بناء نظام سلس يسمح لها بإدارة هاته العمليات دون عناء. يهدف هذا العمل إلى عرض تصميم يستفيد من بنية بحيرة البيانات والحوسبة السحابية لبناء خط أنابيب قوي وموثوق لتجميع ومعالجة بيانات فارمي لتمكينها من تطبيق الذكاء الإصطناعي والقيام بأعمالها. Dans ce monde ax e sur les donn ees, les entreprises doivent travailler sur leur strat egie de donn ees pour survivre et rester comp etitives. N eanmoins, la plupart des strat egies de donn ees existantes requiert un e ort manuel consid erable et ne cr ee pas la valeur attendue. La startup Farmy, proposant des solutions bas ees sur l'intelligence, a rapidement r ealis e l'importance d'une architecture de donn ee robuste. Un pipeline de donn ees automatis e permet egalement d'acc el erer le d eveloppement de nouvelles solutions. Le cadre de ce projet concerne la conception d'une architecture de donn ees et l'impl ementation d'un pipeline de donn ees destin ees a la startup Farmy. Ce pipeline de donn ees est appliqu e au diagnostic automatis e de maladies des plantes. La premi ere etape consiste a r ecup erer p eriodiquement des images et leurs m etadonn ees a depuis les r eseaux sociaux et d'autres sources. Par la suite, ces donn ees collect ees sont stock ees et catalogu ees dans un lac de donn ees dans le cloud. En n, ces donn ees sont enrichies par des annotations r ealis ees par des experts agricoles.	en_US
dc.language.iso	en	en_US
dc.publisher	Université Mohamed El Bachir El Ibrahimi de Bordj Bou Arreridj	en_US
dc.title	Architecture Design and implementation of the startup Farmy.ai's data pipeline.	en_US
dc.type	Thesis	en_US