Machine learning workflow

In machine learning projects, it’s valuable to develop a repeatable pattern or workflow that helps you quickly achieve good results from your data and models. Once you create this workflow, you can apply it repeatedly to get results faster and more reliably.

In this post, I want to explain and share my view about the ML process I learned from Jason Brownlee. Let’s dive into the five essential steps for solving machine learning problems effectively:

The 5 Steps of the Machine Learning Workflow

Define the Problem
Prepare the Data
Spot Check Algorithms
Improve Results
Present Results

1. Define the Problem

Defining the problem is the first and most important step in any machine learning project. You can use the most advanced algorithms and powerful approaches, but the results will be useless if you are solving the wrong problem.

Jason Brownlee suggests a simple framework that helps to quickly see the problem from different perspectives. It has three parts:

1.1 What is the Problem?

Describe the problem informally, formally, and list assumptions as well as similar problems.

Informal description: Describe the problem as if you were explaining it to a friend. This helps create a clear, one-sentence description that shares the understanding of the problem.

Example: “I need a program that will tell me which tweets will get retweets.”
Formalism: This follows Tom Mitchell’s classic formalism of machine learning:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Example:
- Task (T): Classify a tweet that has not yet been published as likely to get retweets or not.
- Experience (E): A corpus of tweets from an account where some tweets have retweets and some do not.
- Performance (P): Classification accuracy — percentage of tweets correctly predicted.
Assumptions: List domain-specific assumptions or rules of thumb. These help speed up solution development and highlight parts of the problem that may need further review.

Examples:
- The specific words used in the tweet matter to the model.
- The identity of users who retweet does not matter.
- The number of retweets matters to the model.
- Older tweets are less predictive than more recent tweets.
Similar Problems: Consider related problems that can inform your approach by highlighting similarities or limitations.

Example: Email spam detection is a related binary classification problem using text data.

1.2 Why Does the Problem Need to be Solved?

This is about motivation and benefits — why solving this problem matters.

Motivation: What need will be fulfilled when the problem is solved? Sometimes you solve problems to learn new skills rather than just to get the best performance.
Solution Benefits: What new capabilities will your solution enable? Clarifying benefits helps you sell the project to colleagues or management and secure resources.
Solution Use: Consider how long you expect the solution to be used. Will it just produce a report or will it be operationalized as a system? This affects requirements for maintainability and scalability.

1.3 How Would I Solve the Problem Manually?

Before coding, think about how you would solve the problem by hand.

List step-by-step how you would collect data, prepare it, and design a program to solve the problem.
Identify prototypes and experiments needed — these reveal domain knowledge and uncertainties.
This step helps you understand where the data is stored, useful features, and potential pitfalls.

2. Prepare the Data

Machine learning algorithms learn from data, so feeding them the right data is critical. Even if your data is good, you may need to format, scale, and extract meaningful features.

Data preparation involves three main steps:

2.1 Data Selection

Assess the availability of data.
Identify missing data and data that can be removed.
Include only data variables that strongly relate to the problem.

Ask yourself:

What data do I have?
What data do I wish I had?
What data can I exclude and why?

2.2 Data Preprocessing

Organize your data by formatting, cleaning, and sampling.

Formatting: Convert data into usable formats, such as transforming text files into relational databases.
Cleaning: Remove irrelevant, duplicate, or sensitive data that do not help the problem. Handle missing data thoughtfully.
Sampling: If your dataset is very large, select a representative subset to speed up prototyping.

2.3 Data Transformation

Transform preprocessed data for machine learning using feature engineering techniques such as scaling, aggregation, and decomposition.

Scaling: Adjust feature values to similar scales (e.g., dollars vs kilograms).
Decomposition: Break complex features into simpler components.
Aggregation: Combine multiple features into one meaningful feature.

Transformation choices depend on the algorithm used and domain knowledge. You may need to revisit this step multiple times.

3. Spot Check Algorithms

Spot checking means testing many different machine learning algorithms quickly to identify which ones work well with your dataset.

Use 10-fold cross-validation to reliably estimate performance.

Use statistical tests to separate meaningful results from noise.
Use box plots to visualize accuracy distributions across algorithms.

Common algorithms to spot check include:

Decision Trees
Logistic Regression
Support Vector Machines (SVM)
k-Nearest Neighbors (k-NN)
Naive Bayes
Random Forests

Jason typically runs 10–20 algorithms from all[[the major algorithm families]]

4. Improve Results

Having one or two algorithms that perform reasonably well on a problem is a good start, but sometimes you may be incentivized to get the best result you can, given the time and resources you have available.

This is the part where you have to squeeze out extra performance and improve the results you get from your machine learning algorithms.

When tuning algorithms, we must have high confidence in the results given by our test harness. This means we should use techniques that reduce the variance of the performance measure we’re using to assess algorithm runs — such as cross-validation with a reasonably high number of folds.

Algorithm Tuning (Hyperparameter Optimization)

Also called hyperparameter optimization, this is the process of modifying the settings of a machine learning algorithm to improve its performance on your specific problem.

Even if you already chose a good algorithm, you can often boost its performance significantly just by adjusting its hyperparameters.

Think of tuning like adjusting knobs on a machine — each knob (parameter) affects how the algorithm learns and performs.

A good place to start is to get better results from algorithms that you already know perform well, by exploring and fine-tuning their configurations.

ML algorithms are parameterized, and modification of those parameters can influence the outcome. Think of each algorithm’s parameters like dimensions on a graph, with the values of a given parameter as a point along the axis.
→ 3 parameters form a cube,
→ n parameters form an n-dimensional hypercube of possible configurations.

To explore this hypercube efficiently, you can use automated methods that impose a grid on the possibility space and sample where good algorithm configurations might be. Then we can use several search methods like:

[[Grid Search]]
[[Random Search]]
[[Bayesian Optimization]]

These methods help zoom in on optimal performance using optimization algorithms — and we repeat the process until we find the best we can achieve.

Ensembles

Ensemble methods are concerned with combining the predictions of multiple models to improve overall results.
They work well when you have multiple “good enough” models that specialize in different aspects of the problem.

Here are three ensemble strategies you can explore:

Bagging (Bootstrap Aggregation): The same algorithm is trained on different subsets of the training data, giving different perspectives on the problem.
Boosting: Different algorithms are trained sequentially on the same training data, often correcting each other's mistakes.
Blending (also known as Stacked Generalization or Stacking): A variety of models are trained, and their predictions are passed to a meta-model that learns how to combine them for an overall prediction.

💡 It’s a good idea to get into ensemble methods after you’ve exhausted traditional methods.
Why? Because:

They’re generally more complex.
Traditional methods provide a solid baseline and foundation you can draw from to create better ensembles.

Extreme Feature Engineering

Algorithm tuning and ensembles help you get more from machine learning algorithms.
This third strategy is about exposing more structure in the problem so algorithms can learn better.

Your dataset likely has complex, multi-dimensional structure embedded in it — and ML algorithms can find and exploit that structure if you expose it properly.

Some of those structures may be too dense or complex for algorithms to uncover without help. That’s where feature engineering comes in.

A powerful tactic is to decompose attributes into multiple features — this reduces dependencies and non-linear relationships into simpler, independent, linear ones that the algorithm can learn from more effectively

5. Present Results

A model’s results are meaningless unless communicated or operationalized effectively.

Depending on the type of problem you are trying to solve, the presentation of results will be very different. There are two main facets to making use of the results of your machine learning endeavor:

Report the Results

Once you have discovered a good model and a good enough result (or not, as the case may be), you will want to summarize what was learned and present it to stakeholders.

This is the template used by jason it may taken like a text document , report or presentation slides: context(why) : Define the environment in which the problem exists and set up the motivation for the research question.

Problem(question): Concisely describe the problem as a question that you went out and answered. Solution(Answer): Concisely describe the solution as an answer to the question you posed in the previous section. Be specific.

Findings: Bulleted lists of discoveries you made along the way that interests the audience. They may be discoveries in the data, methods that did or did not work or the model performance benefits you achieved along your journey.

Limitations : Consider where the model does not work or questions that the model does not answer. Do not shy away from these questions, defining where the model excels is more trusted if you can define where it does not excel.

Conclusions(Why + Question+ Answer) : Revisit the “why”, research question and the answer you discovered in a tight little package that is easy to remember and repeat for yourself and others.

Visualizations help make reports clearer and more impactful.

Operationalize the System

Once a machine learning model performs well enough to address a real-world problem, the next critical step is operationalizing it — deploying it into a production environment Before deployment, it’s essential to carefully consider three key areas: algorithm implementation (Can the model run reliably in the target system?), automated testing(Will it continue to work correctly and safely after changes?), and performance tracking over time (Will it stay effective over time, or will its performance decay?), . These three issues will very likely influence the type of model you choose.

Algorithm Implementation:
When you're developing and testing machine learning models, you probably use popular research-focused libraries like scikit-learn, PyTorch, or TensorFlow. These tools are great for experimenting, trying different models, and finding out which algorithm gives the best accuracy or performance.

But Think very hard about the dependencies and technical debt you may be creating by taking that experimental code and just plug it into your final system, Consider locating a production-level library that supports the method you wish to use. You may have to repeat the process of algorithm tuning if you switch to a production level library at this point.

When switching from a research library to a production one, the behavior of the model might change slightly. Because The internal implementation might be a bit different So you may need to Re-tune hyperparameters (like learning rate, max depth, regularization, etc.) and Re-test the model’s performance

Another option: code the algorithm from scratch — especially if Your problem is unique, You need full control over performance, You want to eliminate third-party dependencies But This option may introduce risk depending on the complexity of the algorithm you have chosen and the implementation tricks it uses.
Model Tests:
Create automated tests to ensure that your machine learning model can be consistently built and reaches a minimum acceptable level of performance each time it's trained This means writing code (like in pytest or unittest) that runs whenever you update your model or codebase and the test should check accuracy, F1-score, etc., to make sure the model performs well every time. Additionally, write tests for your data preprocessing steps to catch issues early. For reliable and reproducible test results, it's recommended to fix random seeds during testing so that the same outputs are produced every time the tests run.
Tracking:
Set up infrastructure that continuously monitors your model’s performance in production. Trigger alerts if performance metrics like accuracy fall below a defined threshold. Monitoring can be done in real-time or periodically using fresh data samples on a copy of the model in a safe testing environment. A drop in performance could indicate concept drift — meaning the data patterns have changed — and the model might require retraining or fine-tuning.

Some models support online learning (they update themselves while running). However, letting models retrain automatically in production can be risky. In many cases, it’s safer to manage this manually: regularly evaluate, retrain offline, and replace old models only after verifying that the new version performs better.

Conclusion

Following a clear, structured workflow from defining the problem to deploying the model helps you build effective machine learning solutions efficiently. Jason Brownlee’s process emphasizes:

Thorough problem understanding
Careful data preparation
Quick algorithm experimentation
Systematic improvement through tuning and ensembles
Clear communication and robust operationalization

By applying these steps repeatedly, you’ll gain speed, confidence, and better results in your ML projects.

Machine Learning (Part 2) – Machine Learning Workflow: My Take on Jason Brownlee’s Process