My GSoC Adventure: From Data to Detection
Introduction
In this project, I'm participating in the "Advancing Bird Detection and Classification in Hand-Held Airborne Imagery" initiative. The primary goals are to refine and optimize the DeepForest model for accurate bird detection and classification in hand-held plane imagery. I aim to develop an annotated dataset for effective training and testing, enhancing the model's performance. By achieving these objectives, I hope to advance the capabilities of AI in bird detection and contribute to ecological research and conservation efforts.
Before diving deeper, I would like to guide you through the blogs that best describe my journey in a detailed way.
Blogs Summary
1 - Learning PyTorch and Early Bird Detection Efforts
I kicked off my GSoC journey by diving into PyTorch, enhancing project documentation, and working on the DeepForest model for bird detection. Despite challenges like limited computational power and data quality issues, I managed to train a baseline model with 90% precision but 10% recall. These early weeks taught me the importance of starting small and iterating, setting the stage for future improvements.
link: Blog
2 - Exploring YOLOv8 for Bird Detection in Aerial Images
I delved into using YOLOv8 for bird detection, tackling challenges like imbalanced classes and a small dataset of 169 high-quality images. Despite these hurdles, I used tools like Roboflow for data preparation and Focal Loss to handle class imbalance. While the initial YOLOv8n model didn’t achieve high accuracy, it set the stage for experimenting with more powerful models like YOLOv8x and Faster R-CNN.
link: Blog
3 - Enhancing Object Detection with Supervision and Pre-Commit Hooks
In my third GSoC blog, I delved into integrating the Supervision library for object detection and explored essential software engineering practices, like setting up pre-commit hooks. This integration improved annotation capabilities, while pre-commit hooks ensured code quality by catching errors before they reach the CI pipeline. These skills are crucial for any software engineer, and I’m excited to continue applying them in my project.
link: Blog
4 - Enhancing Data Quality and Tackling Class Imbalance
I enhanced our object detection project by merging underrepresented classes and using SAHI to tile large images, increasing dataset size and annotation quality. I converted annotations to COCO format, and tracked experiments with Comet ML, finding DeepForest slightly outperforming YOLOv8. Overall, these improvements have significantly boosted data quality and model performance.
link: Blog
5 - New project, New perspective
After the first project I got to work on another project for cattle detection from drone imagery. I filtered five datasets, merged them, and used Roboflow for management. Initial DeepForest model results were poor due to small object detection issues, which I addressed by applying Slicing Aided Hyper Inference (SAHI) to tile images. This increased dataset size but introduced empty tiles. I filtered out empty images, refined annotations, and retrained the model, achieving 92% recall and 73% precision. Next, I'll compare models like YOLOv8, find an optimal threshold, and test on additional data.
link: Blog
6 - Generative AI and LLMs
This blog explores the transformative impact of Large Language Models (LLMs) and generative AI on various industries. It delves into their core mechanisms, including tokenization, embeddings, and transformer architecture, and explains the stages of pre-training and fine-tuning. The post highlights practical applications, from content creation to healthcare, while also addressing ethical concerns like bias and misinformation. The blog aims to equip machine learning engineers with a deeper understanding of these technologies, their potential, and the ethical considerations surrounding their use.
Link: Blog
Coding notebooks
Here are some notebooks I worked on during the program:
Notebook | Link | Details |
Data preparation | Link | In this notebook I am preparing the dataset and grouped the functions I will be using in one place |
Bird detection with yolov8 models | Link | Finetuning yolov8 model to train on my UAV bird detection data |
Cattle Detection with yolov8 models | Link | Finetuning yolov8 model to train on my drone cattle detection data |
Cattle Detection with Deepforest pretrained model | Link | Finetuning deepforest model for cattle detection |
LLM examples notebook | Link | Examples for using hugging face python library to do different Generative AI tasks |
My Contributions to Deepforest
PRs: Link
Issues: Link
Skills
Technical Skills
Deep Learning: Worked with PyTorch, DeepForest, and state of the art models like YOLOv8, and Faster R-CNN for object detection.
Data Preparation and Augmentation: Improved skills in handling datasets, managing class imbalances, and using tools like Roboflow.
Model Evaluation and Optimization: Learned to evaluate models using precision, recall, and other metrics, and optimize them for better performance.
Generative AI and NLP: Explored Large Language Models (LLMs) and used Hugging Face libraries for NLP tasks.
Experiment Tracking: Used tools like Comet ML, and WandB to track experiments and analyze results.
Software Engineering Practices: Gained experience in using pre-commit hooks and maintaining code quality in collaborative projects.
Soft Skills
Problem-Solving: Tackled various challenges related to data quality, model accuracy, and limited resources.
Communication and Collaboration: Improved technical writing through blog posts, and worked effectively with mentors and open-source communities.
Project Management: Managed time efficiently, organized tasks, and balanced multiple aspects of the project.
Adaptability: Learned to adjust strategies based on feedback and results, demonstrating flexibility and a willingness to iterate.
Conclusion
Throughout my GSoC journey, I have explored a variety of techniques and models to enhance object detection, from initial work with PyTorch and DeepForest to experimenting with advanced models like YOLOv8 and Faster R-CNN. This experience has been incredibly valuable in developing both technical and soft skills, such as deep learning, data preparation, model evaluation, problem-solving, and effective communication.
Acknowledgments
I would like to express my deepest gratitude to my mentors, Henry Senyondo , Benjamin Weinstein, and Ethan White , for their support and guidance throughout this project. Your insights, feedback, and encouragement have been invaluable in helping me grow both professionally and personally. I am incredibly grateful for the time and effort each of you dedicated to mentoring me, and for the opportunity to learn from such knowledgeable and supportive individuals.
Thank you for making this journey a truly enriching experience.
Subscribe to my newsletter
Read articles from Muhammed Magdy directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by