defaultdict

Ishwor KafleIshwor Kafle
5 min read

Introduction

The defaultdict, a part of collections module, is a powerful extension of the standard dict class. It overrides one method __missing__ and adds one writable instance variable default_factory. If default_factory is None, it behaves like a standard dict, raising a KeyError.

from collections import defaultdict

# Defaultdict type is a subclass of dict.
>>> issubclass(defaultdict, dict)
True

# Returns the set of attributes and methods that are present in defaultdict
# but not in a regular dict
>>> set(dir(defaultdict)) - set(dir(dict))
{'__missing__', '__module__', '__copy__', 'default_factory'}
  • .__copy__() : A method to create a shallow copy of the defaultdict using copy.copy()

  • .default_factory : Callable used by __missing__() to auto-generate default values for missing keys.

  • .__missing__(key) : Gets called when .__getitem__() can’t find key

  • .__module__ : holds the module in which the class is defined (collections in this case).

The primary purpose of defaultdict is to handle missing keys gracefully by automatically initializing them with a default value.This prevents KeyError exceptions and simplifies handling missing keys.

It is ideal for tasks such as counting, grouping, or managing complex data structures, whether you're dealing with nested dictionaries, creating frequency counters, or handling large datasets efficiently. Let’s dive into how defaultdict works and explore its real-life use cases!

Internal Working Mechanism

When you try to access a missing key in a defaultdict, here's what happens:

  • The __missing__() method of defaultdict is triggered.

  • If a default_factory is set, it is called to generate a default value.

  • The generated default value is assigned to the missing key in the dictionary.

  • The default value is then returned.

If no default_factory is specified, attempting to access a missing key raises a KeyError, just like a regular dictionary.

Default Factory

When you create a defaultdict, you provide a default_factory function (e.g., int, list, set, str or a custom function or lambda). This function gets called whenever a missing key is accessed, and its return value is used as the default value for that key.

def __init__(self, default_factory=None, *args, **kwargs):
    super().__init__(*args, **kwargs)

    if not callable(default_factory) and default_factory is not None:
        raise TypeError('first argument must be callable or None')

    self.default_factory = default_factory

Common Default Factories :

  • int(): Creates 0 for numeric counters (e.g., counting occurrences). Example :
counter = defaultdict(int)
counter['apple'] += 1
print(counter)  # Outputs: defaultdict(<class 'int'>, {'apple': 1})
  • list(): Creates empty list for grouping (e.g., grouping data). Example :
grouped_data = defaultdict(list)
grouped_data['fruits'].append('apple')
print(grouped_data)  # Outputs: defaultdict(<class 'list'>, {'fruits': ['apple']})
  • set(): Creates empty set for unique collections. (e.g., group unique item). Example :
unique_items = defaultdict(set)
unique_items['fruits'].add('apple')
print(unique_items)  # Outputs: defaultdict(<class 'set'>, {'fruits': {'apple'}})
  • str : Creates an empty string. Example :
default_string = defaultdict(str)
print(default_string['greeting'])  # Outputs: '' (empty string)
  • Custom lambda functions for complex default values. Example :
default_lambda = defaultdict(lambda: 'N/A')
print(default_lambda['unknown'])  # Outputs: 'N/A'
  • Custom Factory. Example :
def default_name():
    return 'Unknown'

names = defaultdict(default_name)
print(names['user_123'])  # Outputs: 'Unknown'

__missing__ method:

When you try to access a key that doesn’t exist in the dictionary, Python calls the __missing__ method. In defaultdict, this method calls the default_factory to generate a default value, adds the key to the dictionary with this value, and returns it.

def __init__(self, default_factory=None, *args, **kwargs):
    super().__init__(*args, **kwargs)

    if not callable(default_factory) and default_factory is not None:
        raise TypeError('first argument must be callable or None')

    self.default_factory = default_factory

def __missing__(self, key):
    if self.default_factory is None:
        raise KeyError(key)

    if key not in self:
        self[key] = self.default_factory()

    return self[key]

Performance Considerations

defaultdict introduces a slight overhead compared to regular dictionaries due to the additional factory function call. However, this performance difference is negligible in most use cases and is often outweighed by improved code readability and reduced complexity.

Practical Use Cases for Developers

1. Grouping Data

students = [
    {'name': 'Alice', 'grade': 'A'},
    {'name': 'Bob', 'grade': 'B'},
    {'name': 'Charlie', 'grade': 'A'}
]

# Group students by grade
grade_groups = defaultdict(list)
for student in students:
    grade_groups[student['grade']].append(student['name'])

# Output:
# {'A': ['Alice', 'Charlie'], 'B': ['Bob']}
print(dict(grade_groups))

2. Counting Elements

from collections import defaultdict

def count_word_frequency(text):
    word_freq = defaultdict(int)
    for word in text.split():
        word_freq[word] += 1
    return word_freq

text = "the quick brown fox jumps over the lazy dog"
frequencies = count_word_frequency(text)

# Output:
# {'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 
#  'jumps': 1, 'over': 1, 'lazy': 1, 'dog': 1}
print(dict(frequencies))

3. Nested Dictionaries

from collections import defaultdict

# Simplified nested structure management
user_data = defaultdict(lambda: defaultdict(list))
user_data['users']['john'].append('email')
user_data['users']['jane'].append('phone')

# Output:
# defaultdict(<function <lambda> at 0x...>, 
#             {'users': defaultdict(<class 'list'>, 
#                      {'john': ['email'], 'jane': ['phone']})})
print(dict(user_data))

4. Caching Computations

from collections import defaultdict

def fibonacci_with_cache():
    cache = defaultdict(int)
    def fib(n):
        if n < 2:
            return n
        if n not in cache:
            cache[n] = fib(n-1) + fib(n-2)
        return cache[n]
    return fib

fibonacci = fibonacci_with_cache()

# Output:
# 8 (Fibonacci number for n=6)
# 0 1 1 2 3 5 8
print(fibonacci(6))
print(' '.join(str(fibonacci(i)) for i in range(7)))

5. Handling JSON Data

import json
from collections import defaultdict

data = json.loads('{"user": {"name": "John"}}')
default_data = defaultdict(lambda: 'Unknown', data['user'])

print(default_data['name'])  # Outputs: John
print(default_data['age'])   # Outputs: Unknown

6. Handling Mutable Defaults ( Avoid mutable default traps )


from collections import defaultdict
# Correct approach 1: Lambda function
correct_lambda = defaultdict(lambda: [])

# Correct approach 2: Dedicated factory function
def create_list():
    return []

correct_factory = defaultdict(create_list)

# Example demonstrating the difference
def demonstrate_safe_lists():
    # Problematic incorrect defaultdict
    incorrect = defaultdict(list)
    incorrect['team1'].append('Alice')
    incorrect['team2'].append('Bob')

    # Correct approaches
    safe_lambda = defaultdict(lambda: [])
    safe_factory = defaultdict(create_list)

    safe_lambda['team1'].append('Alice')
    safe_lambda['team2'].append('Bob')

    safe_factory['team1'].append('Alice')
    safe_factory['team2'].append('Bob')

    # Output : 
    # Problematic incorrect: {'team1': ['Alice'], 'team2': ['Bob']}
    # Safe lambda: {'team1': ['Alice'], 'team2': ['Bob']}
    # Safe factory function: {'team1': ['Alice'], 'team2': ['Bob']}
    print("Problematic incorrect:", dict(incorrect))
    print("Safe lambda:", dict(safe_lambda))
    print("Safe factory function:", dict(safe_factory))

demonstrate_safe_lists()

Conclusion

  • defaultdict eliminates manual checks for missing keys, improving readability and performance.

  • It works by invoking default_factory via the __missing__() method.

  • It improves code readability and reduces boilerplate checks.

  • Ideal for grouping, counting, caching, and handling complex data.

  • Thread Safety: defaultdict is not thread-safe. Use locks if needed.

  • Stick to dict when:

    • Missing keys should raise KeyError.

    • You need minimal overhead (rare cases).

References:

0
Subscribe to my newsletter

Read articles from Ishwor Kafle directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Ishwor Kafle
Ishwor Kafle

Experienced Software Engineer with 9+ years of expertise in designing and building scalable, high-performance solutions. Proficient in Python, Java, and JavaScript, I specialize in backend architectures, microservices, and containerized systems. Passionate about clean code, system optimization, and cutting-edge technologies, I thrive in solving complex problems with elegant solutions. When I'm not coding, I enjoy exploring movies, series, and FPS games. Let's connect and build the future—one line of code at a time.