Journey with DeepForest(#2): Milestones, Mentorship, and My GSOC Decision

Table of contents

March 28, 2024
Reflecting on a Month of Growth
The past month has been a whirlwind of mid-terms, hackathons, and deep dives into DeepForest’s codebase. Despite the chaos, I’ve made meaningful contributions—fixing bugs, improving documentation, and even reshaping core functionalities. Here’s a recap of my progress and the exciting path ahead!
Key Contributions
1. Enhancing Visualization Callbacks
Problem: The
images_callback
class used a deprecated function (plot_prediction_dataframe
) and had aroot_dir
path issue.Solution:
Replaced the deprecated function with
plot_results
.Added dynamic color handling for multi-class labels.
Fixed the
root_dir
column mismatch in DataFrames.
Lesson: Debugging often reveals hidden dependencies (like unexpected
Series
vs.str
types).
2. Cleaning Up Documentation
Fixed inconsistencies in code snippets and examples across docs.
Small changes, big impact: Clearer docs = happier contributors!
3. Multi-Model Support for predict_tile
Feature: Enabled
predict_tile
to accept a list of crop models, with results dynamically labeled (e.g.,cropmodel_label_0
,cropmodel_score_0
).Testing: Added robust edge-case checks (empty predictions, multi-model collisions).
Why It Matters: More flexible workflows for users working with diverse models.
4. Migrating Docs to MyST-Markdown
PR #990 (Pending review)
Converted RST docs to MyST (modern Markdown for Sphinx).
Benefits: Better maintainability, richer formatting, and alignment with ReadTheDocs’ recommendations.
Fun Fact: This touched 54 files—patience pays off!
GSOC 2024: My Project Choice
After much deliberation, I’ve decided to pursue Proposal 3: The Airborne Wildlife Benchmark Dataset.
Why This Project?
Problem: Wildlife datasets are fragmented, inconsistently formatted, and rarely ML-ready.
Goal: Create a standardized MillionAnimals benchmark (inspired by MillionTrees) to train a general animal detector.
Tech Stack: PyTorch, COCO annotations, and integration with DeepForest.
Mentor Insights
"The focus is on wrapping datasets into PyTorch loaders—users shouldn’t worry about formats. Think WILDS benchmark meets TorchGeo." — Ben Weinstein
Next Steps:
Challenges & Takeaways
Time Management: Balancing exams, hackathons, and OSS contributions is hard but rewarding.
Debugging Mindset: Errors taught me to trace issues upstream.
Community Power: Mentor feedback is gold.
What’s Next?
Finalize the GSOC proposal for MillionAnimals.
Dive deeper into dataset standardization challenges.
Keep contributing to DeepForest’s core.
Final Thoughts
This journey has been a masterclass in open-source collaboration. Every merged PR fuels my motivation, and I’m thrilled to keep pushing forward. Stay tuned for updates on the GSOC proposal and my adventures in wildlife ML!
Subscribe to my newsletter
Read articles from Abhishek Dimri directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
