What "zip" should you use?

When coding in Python, we may sometimes have to work with iterables, which are just objects you can "iterate" over (or pass over by counting each item). These iterables serve as "containers", so to speak, which allows them to hold the data you want to pass over one at a time. Python allows you as a developer to work with multiple iterables, and in this article, I will be considering two functions that allow you to combine elements in multiple iterables.

Let's begin.

The zip() function

The zip() function is a function that takes two or more iterables (such as lists, strings, tuples, etc) and basically "zips" them together. What that means is simple: the first item from the first iterable is paired with the first item from the second iterable; and then the second set of items, then the third, etc.

Let's see it in action: Say we have two lists, A and B. A is a list of 10 numbers from 1 to 10. B is a list of the first 10 primes, so 2, 3, 5, 7, 11... and so on.

How the function works is very simple: take the first element of A (1) and pair it with the first element of B (2), then continue (2 gets paired with 3, 3 gets paid with 5, and so on) until one of the lists is exhausted.

So, this means that if the input iterables are of different lengths, the resulting iterator will have a length equal to the shortest iterable.

But there's one more thing to consider before we dive right into an example.

Since we are creating pairs (or rather "groups") of elements, where do these groups go into?

Yes, that's right: a tuple. The zip() function aggregates elements from multiple iterables, creating an iterator of tuples.

Here's the syntax, along with an example to illustrate:

# Syntax
zip(iterable_1,...iterable_n)

# Here's an example:
# Using zip() with equal-length iterables
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 35]

zipped_result = zip(names, ages)
result = list(zipped_result)

print(result)
# Output: [('Alice', 25), ('Bob', 30), ('Charlie', 35)]

In this example, we have two lists whose elements are then zipped together and stored in a variable called result .

But what if the iterables are not of the same length? Well, quite simply, the resulting combination will only be as long as the smallest list that was passed.

Let's examine:

names = ['Alice', 'Bob', 'Charles', 'Daniel'] # length of 4
ages = [25, 30, 35, 40, 45, 50] # length of 6

zip_2 = zip(names, ages)
result = list(zip_2)

print(result)

# Output: [('Alice', 25), ('Bob', 30), ('Charlie', 35), (Daniel, 40)]

Every other item in a list that is longer than the shortest one will not be considered.

You can also extend the zip() function to more than two iterables:

# Define lists to be used for this example:
letters = ["K", "O", "R", "J", "P", "X"]
genders = ["Male", "Male", "Male", "Female", "Female", "Female", "Female"]
scores = (65, 86, 88, 74, 69)

# Create a zipped object
zipped = zip(letters, genders, scores)
print(zipped)

See if you can get the output of the code snippet above.

The zip_longest() function

This can be used for scenarios where you need to align and process data from multiple sources with different lengths. Unlike the zip() function above, this one has to be accessed via an import.

from itertools import zip_longest

# Syntax:

itertools.zip_longest(iterable1, iterable2, ..., fillvalue=None)

The zip_longest() function combines elements from multiple iterable data structures, creating an iterator of tuples. It continues until the longest input iterable is exhausted (hence the name), filling up missing values with a specified "fill value", the default being None (but you can use anything you want).

Here's an example:

from itertools import zip_longest 
list_1 =[1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 
list_2 =[7, 9, 11, 13, 15] 
combined = list(zip_longest(list_1, list_2)) # If you want a fill value, simply pass a value into the fillvalue parameter
print(combined)

# Output: [(1, 7), (2, 9), (3, 11), (4, 13), (5, 15), (6, None), (7, None), (8, None), (9, None), (10, None)]

zip_longest is an iterator, so you often convert it to a list or use it in a loop.

Conclusion

Long story short, zip() is suitable when working with equal-length iterables and you want to stop when the shortest iterable is exhausted. On the other hand, zip_longest() is useful when dealing with iterables of different lengths, and you want to continue until the longest iterable is exhausted, filling missing values with a specified fill value.

Here's what brought all of this up: Sometime back, I was working on a project that needed me to display the output in a certain way. When I tried the zip() function, it didn't give me what I wanted, so I did some digging and found the zip_longest() function, so I decided to share my "findings" with others who may not be aware of this.


Thank you so much for reading my very first Hashnode article! You can find me on Facebook here, on Twitter (now X) here, on Instagram here, on LinkedIn here, and GitHub here. Please follow, comment (if you wish), but most importantly — share!

1
Subscribe to my newsletter

Read articles from Samuel B. Olugunna directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Samuel B. Olugunna
Samuel B. Olugunna

Data Science and Analytics, ML, and BI. A numbers guy and a lifelong learner.