Mastering MLOps: Enhancing Machine Learning Workflows with Prefect - Insights from MLOps Zoomcamp 2023


The third topic covered in MLOps Zoomcamp by DataTalksClub is the orchestration of machine learning workflows. Orchestration involves managing and coordinating tasks and components within a machine learning pipeline or workflow. It streamlines and automates the various tasks and stages in a structured and controlled manner.
In this module, the orchestration platform used is Prefect. The tasks covered include:
Using a local Prefect server
Setting up files containing tasks and flows
Deploying the flow
Scheduling tasks
Sending email notifications within the workflow
Using Prefect Cloud
Automating notifications
Installing Prefect on your local computer using pip in a Python environment.
The installation of Prefect on your local computer is straightforward using pip in a Python environment.
pip install prefect
And run the server in the working directory by executing the following command.
prefect server start
There are at least two units used to build workflows in Prefect, namely tasks and flows. These units are implemented as decorators in our Python code functions. A task represents a unit of work within a workflow, which can be a simple or complex operation. On the other hand, a flow is a collection of tasks arranged in a specific order with predefined dependencies. Here is an example implementation in Python code.
@task(retries=3, retry_delay_seconds=2, name="Read taxi data")
def read_data(filename: str) -> pd.DataFrame:
"""Read data into DataFrame"""
# Load or read data into dataframe code
@task(name="Extract the features")
def add_features():
"""Add features to the model"""
# Extract the feature from data code
@task(log_prints=True, name="Train the model")
def train_best_model():
"""train a model with best hyperparams and write everything out"""
# train the model code
@flow
def send_notification_email(email_addresses: list[str], msg: str):
# send notification to emails code
@flow
def main_flow()
"""The main training pipeline"""
# Load
df_train = read_data(train_path)
df_val = read_data(val_path)
# Transform
X_train, X_val, y_train, y_val, dv = add_features(df_train, df_val)
# Train
markdown_report = train_best_model(X_train, X_val, y_train, y_val, dv)
# Send notification
send_notification_email(email_addresses, msg)
if __name__ == "__main__":
main_flow()
From the code, some functions that serve as tasks:
read_data()
add_features()
train_best_model()
Functions that serve as flows:
main_flow()
send_notification_email()
, this flow will be a sub-flow of themain_flow()
In the implementation process, I observed that it can be divided into two stages: development and deployment. In the development stage, we only need to ensure that our pipelines are created correctly without determining scheduling, notifications, and other handling. By ensuring that the Prefect server is running, we can simply execute the file containing the code for tasks and flows. For example, if we write it in orchestrate.py
in your working directory, execute the following command.
python orchestrate.py
The results can be viewed in the Prefect UI.
For the deployment stage, the following steps are taken:
Initiate the Prefect project, with run the command
prefect project init
It will generate some config files for Prefect project.
Deploy the flow, with run the command
prefect deploy orchestrate.py:main_flow -n hw_taxi1 -p zoomcamppool
In the example command we set the name of the deployed flow to 'hw_taxi1' and set the agent pool to 'zoomcamppool' for running the flow. We can create the agent pool via UI or command.
To initiate the worker, ensure that the agent pool or worker pool is not in a paused or stopped state.
bash prefect worker start -p zoomcamppool
Run the deployed flow
bash prefect deployment run main-flow/hw_taxi1
To set the schedule, we can utilize either the UI or the CLI. Here's an example for reference.
bash prefect deployment set-schedule --cron "0 9 3 * *" main-flow/hw_taxi1
We can monitor the running flows and configure settings through the Prefect UI.
That's all for this week's article and progress. I believe there is still much to explore, especially regarding integration with other aspects of MLOps. Are there any alternative platforms for machine learning orchestration?
Subscribe to my newsletter
Read articles from Saiful Rijal directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
