Demystifying list comprehensions in Python
If you are new to Python, you might have stumbled across "list comprehension". I know that for more experienced Python developers, this is an easy concept, but from my experience, most newcomers to Python struggle with it. My goal with this article is to explain list comprehensions quickly.
Let's start with a simple example. We have a list of words, and we want to create a new list containing the amount of vowels in each word. If we don't know anything about list comprehensions, we would write our code something like this:
words = ["apple", "banana", "orange", "grape", "kiwi", "pineapple", "watermelon", "strawberry", "blueberry", "peach"]
def count_vowels(word):
vowels = 'aeiouAEIOU'
count = 0
for char in word:
if char in vowels:
count += 1
return count
result = []
for word in words:
result.append(count_vowels(word))
print(result)
The result of this program is [2, 3, 3, 2, 2, 4, 4, 2, 3, 2]
.
How do we rewrite this program so that it uses list comprehensions? A list comprehension is an expression that creates a new list based on three things - an iterable
(if you don't know what an Iterable
is, click below), a transformation (what we do with each of the items), and a filter (a function which checks if an item should be included in the new list). Here is the base form of the list comprehension in Python:
In our case, the iterable is words,
and the transformation is to apply count_vowels
to each word in words
. So, instead of creating a new list, iterating through words
, and appending the result of count_vowels
, we can write result = [count_vowels(word) for word in words]
. Here is the modified code:
In our case, the iterable is words,
and the transformation is to apply count_vowels
to each word in words
. So, instead of creating a new list, iterating through words
, and appending the result of count_vowels
, we can write result = [count_vowels(word) for word in words]
. Here is the modified code:
words = ["apple", "banana", "orange", "grape", "kiwi", "pineapple", "watermelon", "strawberry", "blueberry", "peach"]
def count_vowels(word):
vowels = 'aeiouAEIOU'
count = 0
for char in word:
if char in vowels:
count += 1
return count
result = [count_vowels(word) for word in words]
print(result)
The result of this program is still [2, 3, 3, 2, 2, 4, 4, 2, 3, 2]
, so that hasn't changed. Notice how our code got a bit shorter now when using list comprehensions. As we will see below, this is not the only benefit of using list comprehensions.
Another example: We have a list of numbers and want to create a new list containing only the numbers with a power of 2. We have a function called is_power_of_2
, which will return True
if the number is indeed a power of 2 and False
otherwise.
If we don't know anything about list comprehensions, we would write our code something like this:
import math
def is_power_of_2(number):
return 2 ** int(math.log2(number)) == number
numbers = [64, 84, 2, 56, 29, 4, 18, 91, 42, 75]
result = []
for number in numbers:
if is_power_of_2(number):
result.append(number)
print(result)
The result of this program is [64, 2, 4]
as 2, 4 and 64 are powers of 2 (2^1=2, 2^2=4, 2^6=64). How would this snippet look like if we rewrote it using list comprehensions?
import math
def is_power_of_2(number):
return 2 ** int(math.log2(number)) == number
numbers = [64, 84, 2, 56, 29, 4, 18, 91, 42, 75]
result = [number for number in numbers if is_power_of_2(number)]
print(result)
Again, the program's result hasn't changed (still [2, 4]
). We just replaced the for loop and append
with a list comprehension. This time, we don't modify the number itself; we need the ones that are powers of 2 (or, simply, the numbers for which is_power_of_2
returns True
)
Let's go back to the first example, the counting of the vowels. Can we rewrite the count_vowels
function to use list comprehensions? This one is a bit trickier because the action we do is not that clear - we increase a counter, and we don't produce a list. Or at least not at this moment. If we carefully examine what the functions do, we can get a clue on how to rewrite as a list comprehension.
def count_vowels(word):
vowels = 'aeiouAEIOU'
count = 0
for char in word:
if char in vowels:
count += 1
return count
This function goes through each character in the word, checks if the character is present in the vowel string, and increases the counter by 1. What if, instead of increasing a counter, we filter out the characters that are vowels? From there, we can count the length of the resulting list and get the number of vowels (as they will be the only ones left). The code would look something like this:
def count_vowels(word):
vowels = 'aeiouAEIOU'
vowels_in_word = [char for char in word if char in vowels]
return len(vowels_in_word)
Looks better.
Let's look at another example, where we will dive into the world of chess. We want to write a Python function that calculates all the possible moves for that knight from a given position of a knight.
A short refresher on knight's movement: it can move either two squares horizontally and one square vertically or one square horizontally and two squares vertically. If our knight is on d5, it can move to e7, f6, f4, e3, c3, b4, b6, c7.
Let's try to write this function without using list comprehensions. The main flow will be the following - convert the input from chess notation to a pair of integers (so d5
will become (4, 5)
). Then, we will generate all the moves from a given offset. After that, we will filter out the valid moves (as some might be outside the chessboard's bounds). We will convert the remaining moves back to chess notation and return them.
def possible_knight_moves(current_position: str) -> list[str]:
# Convert the chess notation to a numerical one: a -> 1, b -> 2, etc.
current_row = ord(current_position[0]) - ord('a') + 1
current_column = int(current_position[1])
# Generate all the moves
offsets = [(2, 1), (1, 2), (-1, 2), (-2, 1), (-2, -1), (-1, -2), (1, -2), (2, -1)]
all_moves = []
for offset_row, offset_column in offsets:
new_row = current_row + offset_row
new_column = current_column + offset_column
all_moves.append((new_row, new_column))
# Filter out the valid moves
valid_moves = []
for row, column in all_moves:
if 1 <= row <= 8 and 1 <= column <= 8:
valid_moves.append((row, column))
# Convert valid moves back to chess notation
chess_notation_moves = []
for row, column in valid_moves:
move = '{}{}'.format(chr(row + ord('a') - 1), column)
chess_notation_moves.append(move)
return chess_notation_moves
>>> possible_knight_moves('d5')
['f6', 'e7', 'c7', 'b6', 'b4', 'c3', 'e3', 'f4']
Now, this is a big function. Before we transform it with list comprehensions, let's split it up a bit, following the "single-responsibility" principle.
def convert_from_chess_notation(current_position: str) -> tuple[int, int]:
# Convert the chess notation to a numerical one: a -> 1, b -> 2, etc.
current_row = ord(current_position[0]) - ord('a') + 1
current_column = int(current_position[1])
return current_row, current_column
def generate_all_moves(current_row: int, current_column: int) -> list[tuple[int, int]]:
# Generate all the moves
offsets = [(2, 1), (1, 2), (-1, 2), (-2, 1), (-2, -1), (-1, -2), (1, -2), (2, -1)]
all_moves = []
for offset_row, offset_column in offsets:
new_row = current_row + offset_row
new_column = current_column + offset_column
all_moves.append((new_row, new_column))
return all_moves
def filter_out_valid_moves(all_moves: list[tuple[int, int]]) -> list[tuple[int, int]]:
# Filter out the valid moves
valid_moves = []
for row, column in all_moves:
if 1 <= row <= 8 and 1 <= column <= 8:
valid_moves.append((row, column))
return valid_moves
def convert_to_chess_notation(moves: list[tuple[int, int]]) -> list[str]:
# Convert valid moves back to chess notation
chess_notation_moves = []
for row, column in moves:
move = '{}{}'.format(chr(row + ord('a') - 1), column)
chess_notation_moves.append(move)
return chess_notation_moves
def possible_knight_moves(current_position: str) -> list[str]:
current_row, current_column = convert_from_chess_notation(current_position)
all_moves = generate_all_moves(current_row, current_column)
valid_moves = filter_out_valid_moves(all_moves)
chess_notation_moves = convert_to_chess_notation(valid_moves)
return chess_notation_moves
It's a bit better, but still a lot of code. Let's start applying list comprehensions to each of the functions:
def convert_from_chess_notation(current_position: str) -> tuple[int, int]:
# Convert the chess notation to a numerical one: a -> 1, b -> 2, etc.
current_row = ord(current_position[0]) - ord('a') + 1
current_column = int(current_position[1])
return current_row, current_column
def generate_all_moves(current_row: int, current_column: int) -> list[tuple[int, int]]:
# Generate all the moves
offsets = [(2, 1), (1, 2), (-1, 2), (-2, 1), (-2, -1), (-1, -2), (1, -2), (2, -1)]
all_moves = [(current_row + offset_row, current_column + offset_column) for offset_row, offset_column in offsets]
return all_moves
def filter_out_valid_moves(all_moves: list[tuple[int, int]]) -> list[tuple[int, int]]:
# Filter out the valid moves
valid_moves = [(row, column) for row, column in all_moves if 1 <= row <= 8 and 1 <= column <= 8]
return valid_moves
def convert_to_chess_notation(moves: list[tuple[int, int]]) -> list[str]:
# Convert valid moves back to chess notation
chess_notation_moves = ['{}{}'.format(chr(row + ord('a') - 1), column) for row, column in valid_moves]
return chess_notation_moves
def possible_knight_moves(current_position: str) -> list[str]:
current_row, current_column = convert_from_chess_notation(current_position)
all_moves = generate_all_moves(current_row, current_column)
valid_moves = filter_out_valid_moves(all_moves)
chess_notation_moves = convert_to_chess_notation(valid_moves)
return chess_notation_moves
While this code looks good, there is still room for improvement. Let's take a look at generate_all_moves
and filter_out_valid_moves
. Currently, we generate all possible moves from a single move and then pass that to the filter function to filter out the valid moves. What if we had a function that checked if a single move is valid? Then, we can combine the generation of the moves and the filtering in a simple comprehension. We can also separate the generation of a single move from a given start move position and an offset. The conversion to chess notation can also be implemented for a single location.
def generate_move(current_row: int, current_column: int, offset: tuple[int, int]) -> tuple[int, int]:
offset_row, offset_column = offset
return current_row + offset_row, current_column + offset_column
def is_move_valid(move: tuple[int, int]) -> bool:
row, column = move
return 1 <= row <= 8 and 1 <= column <= 8
def convert_to_chess_notation(move: tuple[int, int]) -> str:
row, column = move
return '{}{}'.format(chr(row + ord('a') - 1), column)
With these two methods, we can change the way we generate the moves in the following way:
valid_moves = [generate_move(current_row, current_column, offset) for offset in offsets if is_move_valid(generate_move(current_row, current_column, offset))]
For each offset, we will call generate_move
to get the move we want and filter out valid moves (by calling is_move_valid
). With this modification, our complete code will look something like this:
def convert_from_chess_notation(current_position: str) -> tuple[int, int]:
# Convert the chess notation to a numerical one: a -> 1, b -> 2, etc.
current_row = ord(current_position[0]) - ord('a') + 1
current_column = int(current_position[1])
return current_row, current_column
def convert_to_chess_notation(move: tuple[int, int]) -> str:
row, column = move
return '{}{}'.format(chr(row + ord('a') - 1), column)
def generate_move(current_row: int, current_column: int, offset: tuple[int, int]) -> tuple[int, int]:
offset_row, offset_column = offset
return current_row + offset_row, current_column + offset_column
def is_move_valid(move: tuple[int, int]) -> bool:
row, column = move
return 1 <= row <= 8 and 1 <= column <= 8
def possible_knight_moves(current_position: str) -> list[str]:
offsets = [(2, 1), (1, 2), (-1, 2), (-2, 1), (-2, -1), (-1, -2), (1, -2), (2, -1)]
current_row, current_column = convert_from_chess_notation(current_position)
valid_moves = [generate_move(current_row, current_column, offset) for offset in offsets if is_move_valid(generate_move(current_row, current_column, offset))]
chess_notation_moves = [convert_to_chess_notation(move) for move in valid_moves]
return chess_notation_moves
One final modification must be made before declaring that the code is ready. Notice how, in the valid_moves
comprehensions, we call the generate_move
function twice. With Python3.8 and newer, this can be avoided using the walrus operator. The walrus operator allows us to assign a name to an expression, which can be used on multiple places in the list comprehension. An important detail to remember is where the operator will be used in the comprehension. It's not in the transformation part but in the filter part (as the if
part is evaluated before the transformation part).
The syntax for the walrus operator itself is straightforward and looks something like this: name := expression
. In our case, we can give a name for the result of generate_move
. The comprehension would look something like this:
valid_moves = [new_move for offset in offsets if is_move_valid((new_move := generate_move(current_row, current_column, offset)))]
Notice how we name the generate_move
result to new_move
and use that name in the transformation part.
Finally, our code looks something like this:
def convert_from_chess_notation(current_position: str) -> tuple[int, int]:
# Convert the chess notation to a numerical one: a -> 1, b -> 2, etc.
current_row = ord(current_position[0]) - ord('a') + 1
current_column = int(current_position[1])
return current_row, current_column
def convert_to_chess_notation(move: tuple[int, int]) -> str:
row, column = move
return '{}{}'.format(chr(row + ord('a') - 1), column)
def generate_move(current_row: int, current_column: int, offset: tuple[int, int]) -> tuple[int, int]:
offset_row, offset_column = offset
return current_row + offset_row, current_column + offset_column
def is_move_valid(move: tuple[int, int]) -> bool:
row, column = move
return 1 <= row <= 8 and 1 <= column <= 8
def possible_knight_moves(current_position: str) -> list[str]:
offsets = [(2, 1), (1, 2), (-1, 2), (-2, 1), (-2, -1), (-1, -2), (1, -2), (2, -1)]
current_row, current_column = convert_from_chess_notation(current_position)
valid_moves = [new_move for offset in offsets if is_move_valid((new_move := generate_move(current_row, current_column, offset)))]
chess_notation_moves = [convert_to_chess_notation(move) for move in valid_moves]
return chess_notation_moves
We have now successfully converted our code from for-loops to list comprehensions. But what about the code's performance? Is there a performance benefit of using list comprehensions?
Let's write a small benchmark script that creates a list with numbers from 0 to 9999999. The script will use both the for-loop method and the list comprehension and compare the times it took both to create the list.
import random
import time
if __name__ == "__main__":
start = time.time()
result = []
for i in range(10_000_000):
result.append(i)
end = time.time()
print(f'Append: {(end-start):.2f}')
start = time.time()
result = [i for i in range(10_000_000)]
end = time.time()
print(f'List comprehension {(end-start):.2f}')
For-loop | List comprehension | |
Run 1 | 0.61s | 0.38s |
Run 2 | 0.55s | 0.36s |
Run 3 | 0.57s | 0.28s |
Run 4 | 0.59s | 0.36s |
Run 5 | 0.65s | 0.35s |
Average | 0.59s | 0.35s |
As we can see, on average, list comprehensions are around 40% faster.
I hope you found this article helpful! If you did, make sure to leave a like. If you want to read more Python, click here. If you want to read something shorter, click here. As always, happy coding!
Subscribe to my newsletter
Read articles from Lyuboslav Karev directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Lyuboslav Karev
Lyuboslav Karev
Software developer. Python & Rust fanboy. I love discussing good software design and architecture