Project Structure

A well-structured project makes it easier to scale, debug, and maintain code. Below is the directory structure of Predict-Pipe:
Predict_Pipe/
│── .github/workflows/.gitkeep
│── config/
│ ├── config.yaml
│── params.yaml
│── main.py
│── Dockerfile
│── setup.py
│── research/
│ ├── trials.ipynb
│── templates/
│ ├── index.html
│── src/
│ ├── Predict_Pipe/
│ │ ├── __init__.py
│ │ ├── components/
│ │ │ ├── __init__.py
│ │ ├── utils/
│ │ │ ├── __init__.py
│ │ │ ├── common.py
│ │ ├── logging/
│ │ │ ├── __init__.py
│ │ ├── config/
│ │ │ ├── __init__.py
│ │ │ ├── configuration.py
│ │ ├── pipeline/
│ │ │ ├── __init__.py
│ │ ├── entity/
│ │ │ ├── __init__.py
│ │ │ ├── config_entity.py
│ │ ├── constants/
│ │ │ ├── __init__.py
NOTE: This is the directory structure based on template file given below, files will be added manually as per demand inside the respective directory
Key Files & Directories
.github/workflows/
: Stores CI/CD workflows.config/
: Stores configuration files likeconfig.yaml
.params.yaml
: Defines hyperparameters and other settings.main.py
: The entry point of the application.Dockerfile
: Defines the containerization process.setup.py
: Script for packaging and installing the project.research/
: Stores Jupyter notebooks for experimentation.templates/
: Contains HTML templates for web-based interactions.src/
: The core codebase containing:components/
: Modules for data ingestion, validation, transformation, training, and evaluation.utils/
: Utility functions like saving/loading models.logging/
: Custom logging setup.config/
: Configuration handling.pipeline/
: Orchestrates the ML pipeline.entity/
: Stores entity definitions.constants/
: Defines project-wide constants.
Project Template Script
To automate the creation of this project structure, we use the following Python script:
import os
from pathlib import Path
import logging
logging.basicConfig(level=logging.INFO, format='[%(asctime)s]: %(message)s:')
project_name='Predict_Pipe'
list_of_files=[
".github/workflows/.gitkeep",
f"src/{project_name}/__init__.py",
f"src/{project_name}/components/__init__.py",
f"src/{project_name}/utils/__init__.py",
f"src/{project_name}/utils/common.py",
f"src/{project_name}/logging/__init__.py",
f"src/{project_name}/config/__init__.py",
f"src/{project_name}/config/configuration.py",
f"src/{project_name}/pipeline/__init__.py",
f"src/{project_name}/entity/__init__.py",
f"src/{project_name}/entity/config_entity.py",
f"src/{project_name}/constants/__init__.py",
"config/config.yaml",
"params.yaml",
"main.py",
"Dockerfile",
"setup.py",
"research/trials.ipynb",
"templates/index.html",
]
for filepath in list_of_files:
filepath = Path(filepath)
filedir, filename = os.path.split(filepath)
if filedir !="":
os.makedirs(filedir, exist_ok=True)
logging.info(f"Creating directory; {filedir} for the file: {filename}")
if (not os.path.exists(filepath)) or (os.path.getsize(filepath)==0):
with open(filepath, "w") as f:
pass
logging.info(f"Creating empty file: {filepath}")
else:
logging.info(f"{filename} already exists")
This script ensures that all necessary files and directories are created automatically, maintaining a clean and reproducible structure for your project.
Next, we will break down the modular pipeline components in detail.
Subscribe to my newsletter
Read articles from Yash Maini directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
