DDD Value Objects: Mastering Data Validation in Python
Although DDD (Domain-Driven Design) is not widely adopted within the Python community, there are several resources available on how to implement this approach in the language. Unfortunately, only a few of them offer a good way of defining Value Objects that really ensure data consistency.
This article will walk through these implementation techniques, illustrating their strengths and weaknesses based on a particular example. We will also discuss practical tips on how to manage single-field Value Objects in a deft way. Finally, we will address the problem of redundant validation in Pydantic and how to avoid it.
Introduction
One of the greatest advantages of using Value Objects is that they ensure their values always adhere to business rules (invariants) by validating input data in the constructor. Therefore, when we receive an instance of a Value Object, we can be certain that its value has already been validated.
A prime example reflecting the usability of Value Objects is the Price
class. When working with prices, developers usually use Decimal
as a data type. Despite this approach being better than using float
, there are still some drawbacks. We typically do not allow negative values or those with excessive precision like 0.0000001
. It can be cumbersome if we have to additionally validate Decimal
instances this way in several places in our codebase. Hence, creating a Price
class with validation at the constructor level can greatly simplify and secure the code.
Another use case for Value Objects that we will focus on in the remaining part of the article is email validation. Since an email address is a single value, it is more natural to treat it like a simple data type rather than a complex structure. What we want to achieve in the next paragraphs is to validate the email address and make it lowercase.
#0 Primitive Data Type Approach
The most straightforward way of creating a Value Object is by extending the behavior of some base type. This is quite convenient, especially for single-value objects. Then we need to manually check the input data and alter it if needed.
import re
class EmailAddress(str):
_REGEX = r"^\S+@\S+\.\S+$"
def __new__(cls, value) -> str:
value: str = super().__new__(cls, value).lower()
if not re.match(cls._REGEX, value):
raise ValueError(f"Invalid format")
return value
In the snippet above, EmailAddress
is a subclass of str
. This can be problematic because it inherits all methods that are not suitable for an email address. For example, nothing prevents us from writing code like this:
email = EmailAddress("alice@mail.com")
...
email += '\nRegards!' # Invalid state
In the shown scenario, it's not a big deal, but considering the first example with prices, it may bring more serious consequences:
price = Price(1) # Decimal-based value object
...
discount = 2
price -= discount # Invalid state
#1 Dataclass approach
In many resources, including Cosmic Python, authors recommend using dataclasses
, which is understandable as it is part of the Standard Library. Here's an example:
import re
from dataclasses import dataclass
from typing import ClassVar
@dataclass(frozen=True)
class EmailAddress:
value: str
_REGEX: ClassVar[str] = r"^\S+@\S+\.\S+$"
def __post_init__(self):
if not isinstance(self.value, str):
raise ValueError(f"Value must be a string")
if not re.match(self._REGEX, self.value):
raise ValueError(f"Invalid format")
object.__setattr__(self, "value", self.value.lower())
dataclass
is frozen, so in order to make value
attribute lowercase we need to use tricky object.setattr()
.Now, we finally don't have to worry about unexpected methods in our EmailAddress
, but as you can tell, it seems like quite a lot of code for simple email validation. From now on, we have to manually check the input type, since dataclasses
do not support it. To make matters worse, our regex-based validation is still pretty leaky. Why reinvent the wheel if there are already tools created specifically for this purpose?
#2 Pydantic BaseModel Approach
Here is where Pydantic really shines. It automatically handles type casting and validation. Using the Annotated
type allows adding extra logic in a concise manner.
from typing import Annotated
from pydantic import AfterValidator, BaseModel, ConfigDict, EmailStr
class EmailAddress(BaseModel):
value: Annotated[EmailStr, AfterValidator(lambda x: x.lower())]
model_config = ConfigDict(frozen=True)
Class definition became really neat, but because BaseModel
is a complex structure, we need to pass value
keyword every time we want to specify or access a source value.
Let's consider a common scenario where, in a FastAPI project, we have to use a Value Object as a field in the Request Body. The sample code may look like this:
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class SignUpRequestBody(BaseModel):
email: EmailAddress
password: str
@app.post("/sign_up")
async def sign_up(request_body: SignUpRequestBody):
...
Because our Value Object is an instance of the pydantic
model, all error messages will be directly propagated into the API response. The problem is that EmailAddress
inherits from BaseModel
, so it is treated as a nested object. In other words, the user cannot directly pass a string into the email
field but instead has to send a body like:
{"email": {"value": "abc@gmail.com"}, "password": "Secret123!"}
#3 Pydantic RootModel Approach
This problem can be fixed by replacing BaseModel
with the less-known RootModel
.
from typing import Annotated
from pydantic import AfterValidator, ConfigDict, EmailStr, RootModel
class EmailAddress(RootModel):
root: Annotated[EmailStr, AfterValidator(lambda x: x.lower())]
model_config = ConfigDict(frozen=True)
RootModel
expects only a single field named root
. This way, wherever we use EmailAddress
, it will act as a regular Pydantic field, so the expected input will look like:
{"email": "abc@gmail.com", "password": "Secret123!"}
Avoiding unecessary re-validation
Layered architecture requires repacking of data. Value Objects rely on the lowest layer, so they are excessively used throughout the codebase. Let's consider a sample pseudo-code with a sign-up endpoint implemented in FastAPI.
class SignUpRequestBody(BaseModel):
email: EmailAddress
password: str
class User(BaseModel):
email: EmailAddress
password_hash: str
@app.post("/sign_up")
async def sign_up(request_body: SignUpRequestBody):
...
email = request_body.email
password_hash = PasswordHasher().generate(request_body.password)
with UnitOfWork() as unit_of_work:
user = User(email=email, password_hash=password_hash)
unit_of_work.repository.create(user)
...
EmailAddress
is used in both SignUpRequestBody
and User
classes. Before the sign_up
function is executed, the request body needs to be validated, which means that EmailAddress
validation is also performed. When we instantiate the User
class, the entire validation logic from EmailAddress
is executed once again. This is a simplified example, but in reality, the amount of "repacking" can be much larger. To avoid unnecessary revalidation, we can wrap our value object with pydantic.InstanceOf
.
class User(BaseModel):
email: InstanceOf[EmailAddress]
password_hash: str
This way, Pydantic only checks if the given value is an instance of a specific class. Of course, in the outermost model, we still need to pass the bare ValueObject class, but within all inner classes, we should use the InstanceOf
wrapper.
Takeaways
In conclusion, with Pydantic, we can achieve data validation nearly as effectively as in statically typed languages. Additionally, it supports annotated metadata, which makes the code concise.
To encapsulate the insights shared, here are three key points discussed in the article:
Pydantic models provide automatic type validation
For single-field value objects
RootModel
is preferred overBaseModel
Using
InstanceOf
prevents unnecessary re-validation of the same Value Object
Subscribe to my newsletter
Read articles from scresh directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by