defaultdict


Introduction
The defaultdict
, a part of collections
module, is a powerful extension of the standard dict
class. It overrides one method __missing__
and adds one writable instance variable default_factory
. If default_factory
is None
, it behaves like a standard dict
, raising a KeyError
.
from collections import defaultdict
# Defaultdict type is a subclass of dict.
>>> issubclass(defaultdict, dict)
True
# Returns the set of attributes and methods that are present in defaultdict
# but not in a regular dict
>>> set(dir(defaultdict)) - set(dir(dict))
{'__missing__', '__module__', '__copy__', 'default_factory'}
.__copy__()
: A method to create a shallow copy of the defaultdict usingcopy.copy()
.default_factory
: Callable used by__missing__()
to auto-generate default values for missing keys..__missing__(key)
: Gets called when.__getitem__()
can’t findkey
.__module__
: holds the module in which the class is defined (collections
in this case).
The primary purpose of defaultdict is to handle missing keys gracefully by automatically initializing them with a default value.This prevents KeyError
exceptions and simplifies handling missing keys.
It is ideal for tasks such as counting, grouping, or managing complex data structures, whether you're dealing with nested dictionaries, creating frequency counters, or handling large datasets efficiently. Let’s dive into how defaultdict works and explore its real-life use cases!
Internal Working Mechanism
When you try to access a missing key in a defaultdict, here's what happens:
The
__missing__()
method of defaultdict is triggered.If a
default_factory
is set, it is called to generate a default value.The generated default value is assigned to the missing key in the dictionary.
The default value is then returned.
If no default_factory
is specified, attempting to access a missing key raises a KeyError
, just like a regular dictionary.
Default Factory
When you create a defaultdict, you provide a default_factory function (e.g., int
, list
, set
, str
or a custom function or lambda). This function gets called whenever a missing key is accessed, and its return value is used as the default value for that key.
def __init__(self, default_factory=None, *args, **kwargs):
super().__init__(*args, **kwargs)
if not callable(default_factory) and default_factory is not None:
raise TypeError('first argument must be callable or None')
self.default_factory = default_factory
Common Default Factories :
int()
: Creates 0 for numeric counters (e.g., counting occurrences). Example :
counter = defaultdict(int)
counter['apple'] += 1
print(counter) # Outputs: defaultdict(<class 'int'>, {'apple': 1})
list()
: Creates empty list for grouping (e.g., grouping data). Example :
grouped_data = defaultdict(list)
grouped_data['fruits'].append('apple')
print(grouped_data) # Outputs: defaultdict(<class 'list'>, {'fruits': ['apple']})
set()
: Creates empty set for unique collections. (e.g., group unique item). Example :
unique_items = defaultdict(set)
unique_items['fruits'].add('apple')
print(unique_items) # Outputs: defaultdict(<class 'set'>, {'fruits': {'apple'}})
str
: Creates an empty string. Example :
default_string = defaultdict(str)
print(default_string['greeting']) # Outputs: '' (empty string)
- Custom
lambda
functions for complex default values. Example :
default_lambda = defaultdict(lambda: 'N/A')
print(default_lambda['unknown']) # Outputs: 'N/A'
Custom Factory
. Example :
def default_name():
return 'Unknown'
names = defaultdict(default_name)
print(names['user_123']) # Outputs: 'Unknown'
__missing__ method:
When you try to access a key that doesn’t exist in the dictionary, Python calls the __missing__
method. In defaultdict, this method calls the default_factory
to generate a default value, adds the key to the dictionary with this value, and returns it.
def __init__(self, default_factory=None, *args, **kwargs):
super().__init__(*args, **kwargs)
if not callable(default_factory) and default_factory is not None:
raise TypeError('first argument must be callable or None')
self.default_factory = default_factory
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key)
if key not in self:
self[key] = self.default_factory()
return self[key]
Performance Considerations
defaultdict introduces a slight overhead compared to regular dictionaries due to the additional factory function call. However, this performance difference is negligible in most use cases and is often outweighed by improved code readability and reduced complexity.
Practical Use Cases for Developers
1. Grouping Data
students = [
{'name': 'Alice', 'grade': 'A'},
{'name': 'Bob', 'grade': 'B'},
{'name': 'Charlie', 'grade': 'A'}
]
# Group students by grade
grade_groups = defaultdict(list)
for student in students:
grade_groups[student['grade']].append(student['name'])
# Output:
# {'A': ['Alice', 'Charlie'], 'B': ['Bob']}
print(dict(grade_groups))
2. Counting Elements
from collections import defaultdict
def count_word_frequency(text):
word_freq = defaultdict(int)
for word in text.split():
word_freq[word] += 1
return word_freq
text = "the quick brown fox jumps over the lazy dog"
frequencies = count_word_frequency(text)
# Output:
# {'the': 2, 'quick': 1, 'brown': 1, 'fox': 1,
# 'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}
print(dict(frequencies))
3. Nested Dictionaries
from collections import defaultdict
# Simplified nested structure management
user_data = defaultdict(lambda: defaultdict(list))
user_data['users']['john'].append('email')
user_data['users']['jane'].append('phone')
# Output:
# defaultdict(<function <lambda> at 0x...>,
# {'users': defaultdict(<class 'list'>,
# {'john': ['email'], 'jane': ['phone']})})
print(dict(user_data))
4. Caching Computations
from collections import defaultdict
def fibonacci_with_cache():
cache = defaultdict(int)
def fib(n):
if n < 2:
return n
if n not in cache:
cache[n] = fib(n-1) + fib(n-2)
return cache[n]
return fib
fibonacci = fibonacci_with_cache()
# Output:
# 8 (Fibonacci number for n=6)
# 0 1 1 2 3 5 8
print(fibonacci(6))
print(' '.join(str(fibonacci(i)) for i in range(7)))
5. Handling JSON Data
import json
from collections import defaultdict
data = json.loads('{"user": {"name": "John"}}')
default_data = defaultdict(lambda: 'Unknown', data['user'])
print(default_data['name']) # Outputs: John
print(default_data['age']) # Outputs: Unknown
6. Handling Mutable Defaults ( Avoid mutable default traps )
from collections import defaultdict
# Correct approach 1: Lambda function
correct_lambda = defaultdict(lambda: [])
# Correct approach 2: Dedicated factory function
def create_list():
return []
correct_factory = defaultdict(create_list)
# Example demonstrating the difference
def demonstrate_safe_lists():
# Problematic incorrect defaultdict
incorrect = defaultdict(list)
incorrect['team1'].append('Alice')
incorrect['team2'].append('Bob')
# Correct approaches
safe_lambda = defaultdict(lambda: [])
safe_factory = defaultdict(create_list)
safe_lambda['team1'].append('Alice')
safe_lambda['team2'].append('Bob')
safe_factory['team1'].append('Alice')
safe_factory['team2'].append('Bob')
# Output :
# Problematic incorrect: {'team1': ['Alice'], 'team2': ['Bob']}
# Safe lambda: {'team1': ['Alice'], 'team2': ['Bob']}
# Safe factory function: {'team1': ['Alice'], 'team2': ['Bob']}
print("Problematic incorrect:", dict(incorrect))
print("Safe lambda:", dict(safe_lambda))
print("Safe factory function:", dict(safe_factory))
demonstrate_safe_lists()
Conclusion
defaultdict
eliminates manual checks for missing keys, improving readability and performance.It works by invoking
default_factory
via the__missing__()
method.It improves code readability and reduces boilerplate checks.
Ideal for grouping, counting, caching, and handling complex data.
Thread Safety: defaultdict is not thread-safe. Use locks if needed.
Stick to
dict
when:Missing keys should raise
KeyError
.You need minimal overhead (rare cases).
References:
Subscribe to my newsletter
Read articles from Ishwor Kafle directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Ishwor Kafle
Ishwor Kafle
Experienced Software Engineer with 9+ years of expertise in designing and building scalable, high-performance solutions. Proficient in Python, Java, and JavaScript, I specialize in backend architectures, microservices, and containerized systems. Passionate about clean code, system optimization, and cutting-edge technologies, I thrive in solving complex problems with elegant solutions. When I'm not coding, I enjoy exploring movies, series, and FPS games. Let's connect and build the future—one line of code at a time.