A Data Scientist's Chronicle of Lessons Learned and Strategies for Success

Adapted from the source: Read the full article here.

"A data scientist combines hacking, statistics, and machine learning to collect, scrub, examine, model, and understand data. Data scientists are not only skilled at working with data, but they also value data as a premium product."

— Erwin Caniba

Around 9 years ago, I decided to leave academia in the pursuit of putting my learnings to the test. On that occasion, I have decided to pen down my experiences gathered over the last 9 years of my professional voyage; a journey that has taken me through diverse industries, from the corridors of academia to the landscapes of corporate challenges. I distill my experiences into a set of principles, strategies, and wisdom gained through years of exploration.

  • Always challenge your stakeholders on the problem definitions, problem assumptions, and the importance of the problem at hand. Make sure to understand the business context and challenges of your stakeholders.

  • In many real-world and industrial use cases, leveraging an existing solution promotes efficiency by avoiding unnecessary repetition. Often, you will find that your particular problem has already been solved. If it initially appears to be unique, attempt to abstract the problem, and then look for an algorithm that suits this view. This method usually uncovers that your issue corresponds to well-known problems that have established solutions.

  • Start from design, before code. Your initial design is the key to the project's success. Break down the solution into testable maintainable pieces. Review your design, put it out there, and ask relevant people to challenge it. Feedback is a gift, so embrace it.

  • Start with implementing Unit tests.

  • Be visible, don’t work in a silo, and always speak your mind.

  • Focus on the data rather than focusing on the model. Also, think about other sources of data that can be used. Check on their availability and accessibility.

"Data science isn't about the quantity of data but rather the quality."

— Joo Ann Lee

  • Always start from the simplest model possible, which ticks the requirements.

  • More often than not the models make strong assumptions. So it's always critical to accompany your analysis and findings with a sensitivity analysis: How robust are your conclusions to the model assumptions? Reliable inferences can be only in light of model-specific sensitivity analysis.

  • Ensure that the open-source tools embedded in decision-support solutions are accompanied by thorough documentation and will continue to receive support over time. Additionally, a strong user community around these tools is essential for securing the necessary expertise for future needs.

  • Solution adoption is more important than model performance. Use interpretable and glass box models whenever possible.

  • You cannot improve what you cannot measure. So always have your KPIs in place, and study the impact of the changes on the KPI.

  • It's good to work hard and get the whole to-do list finished. But more importantly, you need to focus on the impact. Protrize the work, measure the impact of each feature, gauge, and show flexibility.

  • When joining a new organization ensure a code review process is in place. Make it part of the organization if it's not already. Always look for ways to make the code review process smoother and more productive.

  • Praising good work is more than a formality; it is a celebration of the commitment to excellence. Each milestone achieved and every challenge overcome is a testament to the unwavering dedication of individuals who go above and beyond. So recognize them in the meeting/stand-ups.

  • Naming is important. When implementing the solution, you will need to cook names along the way, whether it's process names, file names, variable names, function names, etc. Consider this as a selling opportunity.

  • Share your results with your stakeholders in incremental steps. And focus on the story and the implications of the results of your work. In most cases, your stakeholders have no interest in the details of the used algorithms and failed solutions. Just get to the bottom line and avoid technical presentations.

  • Do not focus only on building a great ML model in a notebook. Think automation. Automate as much as possible.

  • When it comes to maintaining legacy code, start with Increased test coverage, API solidifying, and small steps toward an ideal state isolated to subsystems behind the solidified interfaces. No Massive refactors as they introduce too much change to reason about at once, and any mistake costs momentum and trust, which can lead to rolling back and abandonment at great expense. Make sure to take small steps that are easy to roll back. Avoid all-or-nothing ethos around the success of the refactor.

  • Do not stop learning. Self-development is the key to both individual and team success.

"The best way to learn data science is to do data science."

Chanin Nantasenamat


* Image Credit: DALL-E 3 with the prompt: Create an image where a puzzle game is featured at the center. As you move towards the outer edges of the image, the puzzle pieces seamlessly transform into pins. These pins are intricately connected to each other with wires, resembling a complex network of junctions in a bustling city. Ensure that the pins exhibit a variety of colors to enhance the vibrant and dynamic feel of the cityscape

53
Subscribe to my newsletter

Read articles from Mohsen Davarynejad directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mohsen Davarynejad
Mohsen Davarynejad

Mohsen is a seasoned optimization expert who takes a principal role in implementing and validating data science and artificial intelligence models. PhD in AI • Senior Consultant #MachineLearning #DataScience @BasicFitNL Previous: @Shell @LINKITGroup @FU_Mashhad @TUDelft @ErasmusMC @DanielDenHoed @CapgeminiNL @KLM