Python @dataclass decorator


The dataclasses
module, introduced in Python 3.7, provides the @dataclass
decorator, which is a powerful tool for creating classes primarily used to store data. It significantly reduces boilerplate code by automatically generating common methods like __init__
, __repr__
, __eq__
, and more.
Why Use Dataclasses?
Before dataclass
, defining a simple data-holding class often involved writing repetitive code:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __repr__(self):
return f"Point(x={self.x}, y={self.y})"
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return self.x == other.x and self.y == other.y
With @dataclass
, this becomes much cleaner and more concise.
Basic Usage
To turn a regular class into a dataclass, simply apply the @dataclass
decorator:
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
# Create an instance
p1 = Point(10, 20)
print(p1) # Output: Point(x=10, y=20)
# Equality comparison
p2 = Point(10, 20)
p3 = Point(30, 40)
print(p1 == p2) # Output: True
print(p1 == p3) # Output: False
As you can see, __init__
and __repr__
are automatically generated. The type hints for the fields are mandatory for dataclass
to work correctly.
Automatic Methods
By default, @dataclass
generates the following methods:
__init__(self, ...)
: Initializes the instance with the specified fields.__repr__(self)
: Provides a developer-friendly string representation of the object.__eq__(self, other)
: Compares two instances for equality based on their field values.__hash__(self)
: Ifeq
andfrozen
are true, it generates a hash method. Otherwise, it's set toNone
.__lt__(self, other)
,__le__(self, other)
,__gt__(self, other)
,__ge__(self, other)
: Comparison methods, iforder=True
is specified in the decorator.
You can control which methods are generated using arguments to the decorator:
@dataclass(eq=True, order=True, repr=True, init=True, frozen=False)
class MyData:
name: str
value: int
init=True
(default): Generates__init__
.repr=True
(default): Generates__repr__
.eq=True
(default): Generates__eq__
.order=False
(default): IfTrue
, generates rich comparison methods (__lt__
,__le__
, etc.).unsafe_hash=False
(default): IfTrue
, forces hash generation even ifeq
isTrue
andfrozen
isFalse
. Use with caution.frozen=False
(default): IfTrue
, instances of the dataclass cannot be modified after creation.
Field Options with field()
The dataclasses.field()
function allows for more granular control over individual fields.
from dataclasses import dataclass, field
@dataclass
class User:
id: int = field(compare=False) # Don't compare 'id' for equality
name: str
email: str = field(repr=False) # Don't include 'email' in __repr__
is_active: bool = True # Field with a default value
created_at: str = field(default_factory=lambda: "2025-01-01") # Default for mutable types
user1 = User(id=1, name="Alice", email="alice@example.com")
user2 = User(id=2, name="Alice", email="alice@example.com")
print(user1) # Output: User(id=1, name='Alice') - email is not in repr
print(user1 == user2) # Output: True - id is not compared
print(user1.created_at) # Output: 2025-01-01
Common field()
arguments:
default
: Sets a default value for the field.default_factory
: A zero-argument function that will be called to provide a default value. Use this for mutable default values (lists, dicts) to avoid sharing the same mutable object across instances.init=True
(default): Include this field in the generated__init__
method.repr=True
(default): Include this field in the generated__repr__
method.compare=True
(default): Include this field in the generated comparison methods (__eq__
,__lt__
, etc.).hash=None
(default): Include this field when computing the hash. IfTrue
, it's included. IfFalse
, it's not. IfNone
, its inclusion depends oncompare
.metadata=None
: A mapping (dict) that can store arbitrary data about the field. Not used bydataclasses
itself, but useful for external tools.kw_only=False
(default): IfTrue
, the field must be specified as a keyword argument in the constructor. This is useful for making certain parameters optional or for clarity.
Post-init Processing: __post_init__
Sometimes, you need to perform additional initialization steps after the standard __init__
method has been run. For this, dataclasses
provides the __post_init__
method.
from dataclasses import dataclass
@dataclass
class Circle:
radius: float
center_x: float = 0.0
center_y: float = 0.0
area: float = 0.0 # This will be calculated in __post_init__
def __post_init__(self):
# Calculate area after radius is initialized
self.area = 3.14159 * self.radius**2
c = Circle(radius=5)
print(c) # Output: Circle(radius=5.0, center_x=0.0, center_y=0.0, area=78.53975)
Note that fields initialized in __post_init__
should generally not be included in the __init__
signature if they are derived from other fields. You can achieve this by setting init=False
in their field()
definition.
from dataclasses import dataclass, field
@dataclass
class Circle:
radius: float
center_x: float = 0.0
center_y: float = 0.0
area: float = field(init=False, default=0.0) # area is not in __init__
def __post_init__(self):
self.area = 3.14159 * self.radius**2
c = Circle(radius=5)
print(c)
Inheritance
Dataclasses support inheritance. When a dataclass inherits from another dataclass, the fields from the base class are included, and new fields can be added.
from dataclasses import dataclass
@dataclass
class Person:
name: str
age: int
@dataclass
class Student(Person):
student_id: str
major: str
s = Student(name="Bob", age=20, student_id="S123", major="Computer Science")
print(s) # Output: Student(name='Bob', age=20, student_id='S123', major='Computer Science')
Important Note on Field Order in Inheritance: When inheriting, ensure that any fields with default values in the derived class come after any non-default fields (including those inherited from the base class). Python's function signature rules apply.
@dataclass
class Base:
a: int
b: int = 10
@dataclass
class Derived(Base):
c: int # This would cause an error if 'b' had a default
d: int = 20 # This is fine
In the above example, c
(a non-default field) comes after b
(a default field). This is allowed because b
is from the base class. The rule applies to the order of definition within a single class.
Frozen Dataclasses
If you want to make instances of your dataclass immutable (i.e., their fields cannot be changed after creation), set frozen=True
in the decorator.
from dataclasses import dataclass
@dataclass(frozen=True)
class ImmutablePoint:
x: int
y: int
p = ImmutablePoint(1, 2)
print(p)
# p.x = 5 # This would raise a dataclasses.FrozenInstanceError
Frozen dataclasses are hashable by default (if all their fields are hashable), making them suitable for use as dictionary keys or in sets.
Using replace()
with an Immutable (Frozen) Task
This is where replace()
becomes essential.
Python
from dataclasses import dataclass, replace
@dataclass(frozen=True) # <-- This makes it immutable!
class ImmutableTask:
id: int
description: str
is_completed: bool = False
priority: str = "medium"
# Create an initial immutable task
immut_task1 = ImmutableTask(id=2, description="Review code")
print(f"Original immut_task1: {immut_task1}")
# Output: Original immut_task1: ImmutableTask(id=2, description='Review code', is_completed=False, priority='medium')
# --- Try to modify directly (will fail) ---
try:
immut_task1.is_completed = True
except Exception as e:
print(f"\nERROR: Cannot modify frozen task directly: {e}")
# Output: ERROR: Cannot modify frozen task directly: cannot assign to field 'is_completed' in frozen instance
# --- Using replace() (the only way to get a 'modified' version) ---
immut_task2 = replace(immut_task1, is_completed=True, priority="high")
print(f"\nNew immut_task2 (completed, high priority): {immut_task2}")
# Output: New immut_task2 (completed, high priority): ImmutableTask(id=2, description='Review code', is_completed=True, priority='high')
print(f"Original immut_task1 is still unchanged: {immut_task1}")
# Output: Original immut_task1 is still unchanged: ImmutableTask(id=2, description='Review code', is_completed=False, priority='medium')
print(f"Are they the same object? {immut_task1 is immut_task2}")
# Output: Are they the same object? False
What happened:
immut_task1
was created. Becausefrozen=True
, its fields cannot be changed after creation.Attempting
immut_
task1.is
_completed = True
results in an error.replace(immut_task1, is_completed=True, priority="high")
successfully creates a newImmutableTask
object (immut_task2
) with the specified changes, while leavingimmut_task1
completely intact.
In summary, dataclasses.replace()
is your tool to get a slightly different copy of an existing dataclass object, ensuring the original remains pristine. This is especially vital for immutable dataclasses.
Comparison with namedtuple
and Regular Classes
Regular Classes: Offer full flexibility but require manual implementation of boilerplate methods.
collections.namedtuple
: Provides immutable, tuple-like objects with named fields. They are lightweight and performant. However, they are immutable by design and don't support inheritance as cleanly as dataclasses. They also lack the fine-grained control over field behavior thatdataclasses.field()
offers.dataclasses.dataclass
: Strikes a balance. It's more flexible thannamedtuple
(e.g., mutable by default, supports inheritance,__post_init__
,field()
options) but still significantly reduces boilerplate compared to regular classes.
When to Use dataclass
Use @dataclass
when:
You need a class primarily for storing data.
You want to reduce boilerplate code for
__init__
,__repr__
,__eq__
, etc.You need type hints for your data fields.
You might need to extend the class with custom methods later.
You need mutable objects (default) or explicitly immutable objects (
frozen=True
).
Avoid dataclass
if:
Your class has complex logic or behavior that isn't primarily data storage.
You need very high performance where the overhead of dataclasses might be a concern (though usually negligible).
Conclusion
The @dataclass
decorator is a fantastic addition to Python, simplifying the creation of data-centric classes. It promotes cleaner, more readable code and encourages the use of type hints, leading to more robust and maintainable applications. By understanding its features like automatic method generation, field()
options, __post_init__
, and frozen
instances, you can leverage its full power to streamline your Python development.
Subscribe to my newsletter
Read articles from Vigneswaran S directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vigneswaran S
Vigneswaran S
With profound zeal, I delve into the essence of coding, striving to imbue it with beauty and clarity. Conjuring wonders through code is, to me, a delightful pastime interwoven with an enduring passion.