Journey with DeepForest(#2): Milestones, Mentorship, and My GSOC Decision

Abhishek DimriAbhishek Dimri
2 min read

March 28, 2024

Reflecting on a Month of Growth

The past month has been a whirlwind of mid-terms, hackathons, and deep dives into DeepForest’s codebase. Despite the chaos, I’ve made meaningful contributions—fixing bugs, improving documentation, and even reshaping core functionalities. Here’s a recap of my progress and the exciting path ahead!


Key Contributions

1. Enhancing Visualization Callbacks

PR #969

  • Problem: The images_callback class used a deprecated function (plot_prediction_dataframe) and had a root_dir path issue.

  • Solution:

    • Replaced the deprecated function with plot_results.

    • Added dynamic color handling for multi-class labels.

    • Fixed the root_dir column mismatch in DataFrames.

  • Lesson: Debugging often reveals hidden dependencies (like unexpected Series vs. str types).

2. Cleaning Up Documentation

PR #971

  • Fixed inconsistencies in code snippets and examples across docs.

  • Small changes, big impact: Clearer docs = happier contributors!

3. Multi-Model Support for predict_tile

PR #983

  • Feature: Enabled predict_tile to accept a list of crop models, with results dynamically labeled (e.g., cropmodel_label_0, cropmodel_score_0).

  • Testing: Added robust edge-case checks (empty predictions, multi-model collisions).

  • Why It Matters: More flexible workflows for users working with diverse models.

4. Migrating Docs to MyST-Markdown

PR #990 (Pending review)

  • Converted RST docs to MyST (modern Markdown for Sphinx).

  • Benefits: Better maintainability, richer formatting, and alignment with ReadTheDocs’ recommendations.

  • Fun Fact: This touched 54 files—patience pays off!


GSOC 2024: My Project Choice

After much deliberation, I’ve decided to pursue Proposal 3: The Airborne Wildlife Benchmark Dataset.

Why This Project?

  • Problem: Wildlife datasets are fragmented, inconsistently formatted, and rarely ML-ready.

  • Goal: Create a standardized MillionAnimals benchmark (inspired by MillionTrees) to train a general animal detector.

  • Tech Stack: PyTorch, COCO annotations, and integration with DeepForest.

Mentor Insights

"The focus is on wrapping datasets into PyTorch loaders—users shouldn’t worry about formats. Think WILDS benchmark meets TorchGeo."Ben Weinstein

Next Steps:

  1. Study WILDS and TorchGeo workflows.

  2. Prototype a PyTorch dataset loader for wildlife data.


Challenges & Takeaways

  • Time Management: Balancing exams, hackathons, and OSS contributions is hard but rewarding.

  • Debugging Mindset: Errors taught me to trace issues upstream.

  • Community Power: Mentor feedback is gold.


What’s Next?

  • Finalize the GSOC proposal for MillionAnimals.

  • Dive deeper into dataset standardization challenges.

  • Keep contributing to DeepForest’s core.


Final Thoughts

This journey has been a masterclass in open-source collaboration. Every merged PR fuels my motivation, and I’m thrilled to keep pushing forward. Stay tuned for updates on the GSOC proposal and my adventures in wildlife ML!

Let’s connect: GitHub | LinkedIn

0
Subscribe to my newsletter

Read articles from Abhishek Dimri directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Abhishek Dimri
Abhishek Dimri