Yield and we will spare your lives!

Vivek KhatriVivek Khatri
2 min read

If you have programmed in python then you must have come across the keyword `yield` so what does it do? It YIELDS!

Have you been through the above pain? So what do we do?

We do not Yield to the machine overlord. We try to become efficient, and don’t let RAM limit our ambitions.

So yield will yield the data line by line.

Let us do a simple test.

import time
import os
import psutil
import random


# Without generators (reading entire file into memory)
def read_file_without_generator(filename):
    with open(filename, "r") as file:
        lines = file.readlines()  # Loads entire file into memory
    return lines


# With generators (using yield)
def read_file_with_generator(filename):
    with open(filename, "r") as file:
        for line in file:  # Reads one line at a time
            yield line.strip()


# Create a large test file (100MB)
def create_large_file(filename, size_mb=100):
    with open(filename, "w") as f:
        for _ in range(size_mb * 10000):  # ~10KB per 100 lines
            f.write(
                "".join(random.choice("abcdefghijklmnopqrstuvwxyz") for _ in range(100))
                + "\n"
            )


# Memory usage function
def get_memory_usage():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / 1024 / 1024  # MB


# Test without generator
def test_without_generator(filename):
    start_time = time.time()
    start_mem = get_memory_usage()

    lines = read_file_without_generator(filename)
    count = sum(1 for line in lines if "a" in line)  # Count lines containing 'a'

    end_time = time.time()
    end_mem = get_memory_usage()

    return {
        "time": end_time - start_time,
        "memory": end_mem - start_mem,
        "count": count,
    }


# Test with generator
def test_with_generator(filename):
    start_time = time.time()
    start_mem = get_memory_usage()

    lines = read_file_with_generator(filename)
    count = sum(1 for line in lines if "a" in line)  # Count lines containing 'a'

    end_time = time.time()
    end_mem = get_memory_usage()

    return {
        "time": end_time - start_time,
        "memory": end_mem - start_mem,
        "count": count,
    }


# Run the comparison
temp_file = "large_test_file.txt"
create_large_file(temp_file)

print("Testing without generator...")
result1 = test_without_generator(temp_file)

print("Testing with generator...")
result2 = test_with_generator(temp_file)

print("\nResults:")
print(f"Without generator: {result1['time']:.2f} seconds, {result1['memory']:.2f} MB")
print(f"With generator: {result2['time']:.2f} seconds, {result2['memory']:.2f} MB")
print(f"Memory saved: {result1['memory'] - result2['memory']:.2f} MB")
print(f"Time difference: {result1['time'] - result2['time']:.2f} seconds")

It is more like lazy loading.

Testing without generator...
Testing with generator...

Results:
Without generator: 0.11 seconds, 153.16 MB
With generator: 0.13 seconds, 0.19 MB
Memory saved: 152.97 MB
Time difference: -0.01 seconds

So you see there was a little difference in time of execution, but you can see how much memory the program hogged.

The general idea is to process the file in chunks, and one can use `yield` for that in python.

0
Subscribe to my newsletter

Read articles from Vivek Khatri directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Vivek Khatri
Vivek Khatri

I am still deciding what should I write here.