Generators in Python
Table of contents
Introduction
Imagine you have a big box of toys, but you can only play with one toy at a time. Generators are like a special magic box that gives you toys one by one, exactly when you ask for them.
Instead of getting all the toys at once and filling up your room with them, the generator box only gives you one toy when you say "Give me a toy." When you're done playing with that toy, you can ask for another one, and the box will give you a new toy.
Generators are great because they save space. If you had to keep all the toys in your room, it would get very messy and crowded. But with the generator box, you only have one toy at a time, so your room stays clean and you can still play with all the toys in the big box.
Generators are also helpful when you don't know how many toys you have or if you want to play with an endless number of toys. The generator box can keep giving you toys forever, and you can keep playing with them as long as you want.
So, generators are like magic boxes that give you toys one by one when you ask for them, keeping your room tidy and letting you play with lots of toys without taking up too much space.
Technical explanation
In Python, a generator is a type of iterable, similar to lists or tuples, but with a crucial difference. While lists store all their values in memory at once, generators generate values on the fly as requested, which makes them more memory-efficient and allows them to handle large amounts of data.
Generators are defined using a special syntax that involves the use of the yield
keyword. A function that contains a yield
statement is called a generator function. When a generator function is called, it returns an iterator object, which can be iterated over to retrieve the generated values.
Here's an example of a simple generator function that generates a sequence of numbers:
def number_generator(n: int):
for i in range(n):
yield i
# Using the generator function
generator = number_generator(5)
for num in generator:
print(num)
In this example, the number_generator
function generates numbers from 0 to n - 1
. When the function is called, it doesn't execute immediately. Instead, it returns a generator object. The actual execution of the function is deferred until the generator object is iterated over. Each time the yield
statement is encountered, it pauses the function, saves its state, and returns the yielded value. The next time the generator is iterated over, it resumes execution from where it left off.
Generators are particularly useful when dealing with large datasets or when generating an infinite stream of values. Since they generate values on demand, they can significantly reduce memory usage and improve performance compared to loading all the values into memory at once.
In addition to using the yield
statement, generators can also be created using generator expressions, which have a syntax similar to list comprehensions but with parentheses instead of square brackets.
# Generator expression
generator = (x for x in range(10) if x % 2 == 0)
for num in generator:
print(num)
In this example, the generator expression generates even numbers from 0 to 9. The values are generated one by one as requested in the for
loop.
Overall, generators provide a convenient way to generate values lazily, making them an efficient and powerful tool in Python programming.
Why use it?
Generators offer several advantages and use cases that make them valuable in Python programming:
Memory efficiency: Generators generate values on the fly, allowing you to work with large datasets without loading all the data into memory at once. This is especially useful when dealing with large files or infinite sequences.
Performance: Since generators produce values on demand, they can save computation time by generating only the necessary values. This can lead to improved performance, especially when dealing with large datasets or complex calculations.
Simplified code: Generators provide a clean and concise way to express complex logic or calculations. They allow you to write code that looks like it produces all the values at once (similar to a list comprehension), while actually generating them lazily.
Infinite sequences: Generators can be used to represent infinite sequences or streams of data. Since they generate values on demand, you can work with sequences that are too large to fit into memory or sequences that continue indefinitely.
Iteration support: Generators are iterable objects, which means you can easily iterate over the values they generate using a
for
loop or by using them in other constructs that expect iterable objects, such as list comprehensions orsum()
.Stateful computation: Generators can maintain their internal state between iterations. This allows you to write functions that remember their previous state and produce values based on that state. It's useful when dealing with computations that require maintaining context or keeping track of progress.
Conserve resources: By generating values on the fly, generators help to conserve system resources such as CPU cycles and memory. They avoid the need to pre-calculate and store all the values, which can be crucial in scenarios with limited resources or when working with large datasets.
Examples
- Reading a large file line by line:
def read_file(file_path):
with open(file_path, 'r') as file:
for line in file:
yield line.strip()
# Using the file reader generator
file_gen = read_file('large_file.txt')
for line in file_gen:
print(line)
Here, the read_file
generator function reads a large file line by line. Instead of loading the entire file into memory, it yields one line at a time, allowing you to process the file efficiently, even if it's too large to fit into memory.
- Filtering elements:
def even_numbers(numbers):
for num in numbers:
if num % 2 == 0:
yield num
# Using the even number filter generator
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_gen = even_numbers(numbers)
for num in even_gen:
print(num)
In this example, the even_numbers
generator function takes a list of numbers and yields only the even numbers. It filters out odd numbers and generates only the desired values, making it easy to work with specific subsets of data.
Takeaway
These examples illustrate the flexibility and power of generators. They allow you to generate infinite sequences, read large files efficiently, filter data on the fly, and perform other tasks that require generating values lazily. A generator is a versatile tool that can simplify complex tasks and optimize resource usage in Python programming. Overall, generators provide a powerful and efficient way to work with sequences of data, particularly when dealing with large or infinite datasets, and when memory efficiency or performance optimizations are important considerations.
Subscribe to my newsletter
Read articles from Nikhil Akki directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Nikhil Akki
Nikhil Akki
I am a Full Stack Solution Architect at Deloitte LLP. I help build production grade web applications on major public clouds - AWS, GCP and Azure.