Know your Python modules: collections

Mohamed AshourMohamed Ashour
5 min read

When you start learning Python and see how easy it can get and how amazing things you can do with it easily and without much confusion, you sometimes miss some of its most amazing features and modules that can help you write better code. One of them is the Collections Module.

Collections is a built-in module that gives you the ability to use great pre-defined data structures to write more efficient and elegant code, In this blog, I will walk you through some of its amazing features and give you some resources to help you learn more about it

Counter

In [1]: from collections import Counter

In [2]: Counter("Mohammed Ashour")
Out[2]:
Counter({'M': 1,
         'o': 2,
         'h': 2,
         'a': 1,
         'm': 2,
         'e': 1,
         'd': 1,
         ' ': 1,
         'A': 1,
         's': 1,
         'u': 1,
         'r': 1})

Basically, as you see, the Counter object takes an iterable and counts the existence of each element of this iterable, and from there you can use it with different functionalities and options packed in the Counter Class, like

In [3]: x = Counter("Mohammed Ashour")

In [4]: x.most_common(3)
Out[4]: [('o', 2), ('h', 2), ('m', 2)]

Using the most_common function and passing 3 to it makes the function returns the most common 3 elements in the counter, you also can get the least common using this trick

In [5]: x.most_common()[-3:]
Out[5]: [('s', 1), ('u', 1), ('r', 1)]

Counter also supports operations between the counter objects, so you can add and subtract the count of the element of 2 objects

In [6]: a-b
Out[6]: Counter({'a': 1, 'b': 1, 'c': 1, 'd': 1})

In [7: b-a
Out[7]: Counter({'h': 1, 'i': 1, 'j': 1, 'k': 1})

In [8]: a+b
Out[8]:
Counter({'a': 1,
         'b': 1,
         'c': 1,
         'd': 1,
         'e': 2,
         'f': 2,
         'g': 2,
         'h': 1,
         'i': 1,
         'j': 1,
         'k': 1})

Chainmap

There are a lot of times that you have multiple dicts that you need to search for a value inside them and you kinda try to merge them or search in them one by one, here comes the Chainmap to the rescue

In [15]: from collections import ChainMap

In [16]: env1 = {"A":1, "B":2, "C":3}

In [17]: env2 = {"C":4, "D":5, "E":6}

In [18]: ChainMap(env1, env2)
Out[18]: ChainMap({'A': 1, 'B': 2, 'C': 3}, {'C': 4, 'D': 5, 'E': 6})

In [19]: cm = ChainMap(env1, env2)

In [20]: cm["A"]
Out[20]: 1

In [21]: cm["D"]
Out[21]: 5

In [22]: cm["C"]
Out[22]: 3

In [23]: cm.new_child({"F":7})
Out[23]: ChainMap({'F': 7}, {'A': 1, 'B': 2, 'C': 3}, {'C': 4, 'D': 5, 'E': 6})

In [24]: cm["F"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-24-3c1bb143dc91> in <module>
----> 1 cm["F"]

/usr/lib/python3.8/collections/__init__.py in __getitem__(self, key)
    896             except KeyError:
    897                 pass
--> 898         return self.__missing__(key)            # support subclasses that define __missing__
    899
    900     def get(self, key, default=None):

/usr/lib/python3.8/collections/__init__.py in __missing__(self, key)
    888
    889     def __missing__(self, key):
--> 890         raise KeyError(key)
    891
    892     def __getitem__(self, key):

KeyError: 'F'

In [25]: cm
Out[25]: ChainMap({'A': 1, 'B': 2, 'C': 3}, {'C': 4, 'D': 5, 'E': 6})

In [26]: cm.update({"F":7})

In [27]: cm["F"]
Out[27]: 7

In [28]: cm
Out[28]: ChainMap({'A': 1, 'B': 2, 'C': 3, 'F': 7}, {'C': 4, 'D': 5, 'E': 6})

As you can see, I used multiple functions to make a showcase for you to see how the chain map work and how each function affects the object.

But when it can be of good use? Actually, there are a lot of great use cases for the ChainMap, and one of them is setting defaults

Imagine you have a config that has some of the default values, you want to use them until I override their values with other ones and I want to be able to add other configs to use, that's where it shines as easy to use without a lot of hassle.

deque

The deque object is basically the pre-defined representation of stack and queue in python

In [29]: from collections import deque

In [30]: x = [1,2,3,4,5,6,7]

In [31]: deque(x)
Out[31]: deque([1, 2, 3, 4, 5, 6, 7])

In [32]: q = deque(x)

In [33]: q.pop()
Out[33]: 7

In [34]: q
Out[34]: deque([1, 2, 3, 4, 5, 6])

In [35]: q.popleft()
Out[35]: 1

In [36]: q
Out[36]: deque([2, 3, 4, 5, 6])

In [37]: q.rotate(1)

In [38]: q
Out[38]: deque([6, 2, 3, 4, 5])

In [39]: q.insert(1,3)

In [40]: q
Out[40]: deque([6, 3, 2, 3, 4, 5])

In [41]: q.append(1)

In [42]: q
Out[42]: deque([6, 3, 2, 3, 4, 5, 1])

In [43]: q.appendleft(1)

In [44]: q
Out[44]: deque([1, 6, 3, 2, 3, 4, 5, 1])

as you can see, there are a lot of functionalities packed in the deque class like the appending and poping from both sides, you can also insert and rotate the queue.

defaultdict

defaultdict is a factory-type class which makes it very simple to construct a dict with a standard structure like the following example

In [50]: from collections import defaultdict

In [51]: d = defaultdict(int)

In [52]: d
Out[52]: defaultdict(int, {})

In [53]: sen = "Hello All welcome to my blog"

In [54]: for char in sen:
    ...:     d[char] += 1
    ...:

In [55]: d
Out[55]:
defaultdict(int,
            {'H': 1,
             'e': 3,
             'l': 6,
             'o': 4,
             ' ': 5,
             'A': 1,
             'w': 1,
             'c': 1,
             'm': 2,
             't': 1,
             'y': 1,
             'b': 1,
             'g': 1})

in this example, you can see that we used the fact that we initialized our dict values to be int to automatically increment the values without even init it or checking if it exists or not.

It's very handy and I use it in a lot of problem-solving situations.

The collections module is a very useful one, I just introduced here a little of its options and built-ins, you can do a lot with it in a more Pythonic way, you can know more about it from the official doc: https://docs.python.org/3/library/collections.html

0
Subscribe to my newsletter

Read articles from Mohamed Ashour directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mohamed Ashour
Mohamed Ashour

Spending most of my time writing code, designing architecture, and creating pipelines to generate data. And the rest of it learning new things to write about.