Type-Safe ML Configs with Hydra + Pydantic (Step by Step)

Repo (optional): https://github.com/siddhi47/pydantic-hydra
Managing configurations in machine learning projects can get messy—fast. What starts as a few command-line arguments or a small JSON file (better than having no configuration at all) often grows into a tangled mess of hard-coded values, inconsistent file paths, and mysterious hyperparameter changes that are impossible to track. I have introduced briefly about the configuration files and hydra in my previous blogs.
In this tutorial, we’ll combine Hydra—a powerful framework for composing and overriding YAML configs—with Pydantic, which enforces strict type validation and catches mistakes before they crash your training run. Together, they give you flexible, readable, and type-safe configurations that scale from a single experiment to a full production ML pipeline.
Why Not Just Use argparse or a Dict of plain old config file?
No type safety → Typos or wrong data types silently break your experiment.
No validation → Missing fields or invalid values fail at runtime (or worse, not at all).
Poor maintainability → Big projects with multiple config sections become unreadable.
Pydantic solves these problems by binding your config to a schema.
What you’ll build
A small ML-ready config system where:
Configs live in readable YAML files (organized by groups like
data/
,model/
,training/
).Hydra composes and overrides configs from the command line.
Pydantic validates the composed config (types, required fields, bounds).
(BONUS) Includes a COCO dataset variant with file/path validation and safe defaults.
Prereqs
Python
Git
Basic familiarity with virtual envs
1) Project structure
mkdir pydantic-hydra && cd pydantic-hydra
mkdir -p conf/data conf/model conf/training src
touch main.py src/schema.py conf/config.yaml \
conf/data/coco.yaml conf/data/generic.yaml \
conf/model/resnet.yaml conf/training/default.yaml
Your directory structure should now look like this:
pydantic-hydra/
├─ conf/
│ ├─ config.yaml
│ ├─ data/
│ │ ├─ coco.yaml
│ │ └─ generic.yaml
│ ├─ model/
│ │ └─ resnet.yaml
│ └─ training/
│ └─ default.yaml
├─ src/
│ └─ schema.py
└─ main.py
2) Install deps
Option A: pip
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -U pip
pip install hydra-core omegaconf "pydantic>=2"
(If you like uv
)
Option B: uv
uv venv && source .venv/bin/activate
uv pip install hydra-core omegaconf "pydantic>=2"
3) Write the Pydantic schema (src/
schema.py
)
We’ll model two dataset types—generic and coco—using a discriminated union so Pydantic knows which schema to apply based on the type
field.
from pydantic import BaseModel, Field
from typing import Literal, Union
class ModelConfig(BaseModel):
name: str
hidden_units: int
dropout: float
class DataConfig(BaseModel):
type: Literal["generic"] = "generic"
path: str
shuffle: bool = True
class COCOConfig(DataConfig):
type: Literal["coco"] = "coco"
annotation_file: str
image_size: int
allowed_class: list[str]
class TrainingConfig(BaseModel):
learning_rate: float
batch_size: int
epochs: int
class LoggingConfig(BaseModel):
log_dir: str
log_interval: int
class PipelineConfig(BaseModel):
model: ModelConfig
data: Union[DataConfig, COCOConfig] = Field(discriminator="type")
training: TrainingConfig
logging: LoggingConfig
4) Write Hydra configs (YAML)
conf/config.yaml
defaults:
- model: resnet
- data: coco
- training: default
- _self_
logging:
log_dir: ./logs
log_interval: 50
conf/model/resnet.yaml
name: resnet50
hidden_units: 256
dropout: 0.4
conf/training/default.yaml
learning_rate: 0.0005
batch_size: 64
epochs: 20
conf/data/coco.yaml
type: coco
path: /mnt/datasets/coco2017
annotation_file: /mnt/datasets/coco2017/annotations/instances_train2017.json
image_size: 640
allowed_classes: [person, bicycle, car]
shuffle: true
conf/data/generic.yaml
type: generic
path: /mnt/datasets/mydataset
shuffle: true
5) Glue it together with Hydra (main.py
)
# main.py
import hydra
from omegaconf import OmegaConf
from src.schema import PipelineConfig
@hydra.main(config_path="conf", config_name="config", version_base=None)
def main(cfg):
# 1) Hydra gives an OmegaConf object
# 2) Convert to a plain dict
cfg_dict = OmegaConf.to_container(cfg, resolve=True)
# 3) Validate with Pydantic
validated = PipelineConfig(**cfg_dict)
# Example usage
print("Model:", validated.model.name)
print("Data path:", validated.data.path)
if validated.data.__class__.__name__ == "COCOConfig":
print("Annotation:", validated.data.annotation_file)
print("Classes:", validated.data.allowed_classes)
print("Image size:", validated.data.image_size)
if __name__ == "__main__":
main()
Run it:
python main.py
6) Override anything from the CLI (Hydra superpower)
No file edits needed—compose on the fly:
# Switch to generic dataset
python main.py data=generic data.path=/data/custom
# Keep COCO, bump image size
python main.py data.image_size=1024
# Change allowed classes inline
python main.py data.allowed_classes='[person,car,dog]'
7) Multi-run sweeps (quick grid search)
Hydra can launch multiple runs with different overrides:
python main.py -m training.batch_size=32,64 training.learning_rate=0.001,0.0005
This spawns 4 runs:
(32, 0.001) (32, 0.0005)
(64, 0.001) (64, 0.0005)
8) Production tips
Keep configs modular: prefer many small files (e.g.,
data/coco.yaml
,data/generic.yaml
) over one giant YAML.Validate paths & bounds: use
Path
fields andField(ge=..., le=...)
to catch mistakes early.Stable defaults: avoid mutable defaults; prefer
default_factory
for lists likeallowed_classes
.Reproducibility: the
.hydra/
folder per run captures the composed config—commit it or store it with artifacts.
Wrap-up
You now have a clean, composable, and type-safe configuration system:
Hydra for composition & overrides
Pydantic for validation & helpful errors
YAML for readability & version control
This pattern scales from a single script to a full ML platform without turning your configs into spaghetti.
Note: I’ve used uv to create the package instead of creating the structure manually. So, the github repo may look a bit different from this project structure. Refer to my post here to learn more on this!
Subscribe to my newsletter
Read articles from Siddhi Kiran Bajracharya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Siddhi Kiran Bajracharya
Siddhi Kiran Bajracharya
Hi there! I'm a machine learning, python guy. Reach out to me for collaboration and stuff! :D