Know your Python modules: collections
Table of contents
When you start learning Python and see how easy it can get and how amazing things you can do with it easily and without much confusion, you sometimes miss some of its most amazing features and modules that can help you write better code. One of them is the Collections Module.
Collections
is a built-in module that gives you the ability to use great pre-defined data structures to write more efficient and elegant code, In this blog, I will walk you through some of its amazing features and give you some resources to help you learn more about it
Counter
In [1]: from collections import Counter
In [2]: Counter("Mohammed Ashour")
Out[2]:
Counter({'M': 1,
'o': 2,
'h': 2,
'a': 1,
'm': 2,
'e': 1,
'd': 1,
' ': 1,
'A': 1,
's': 1,
'u': 1,
'r': 1})
Basically, as you see, the Counter
object takes an iterable and counts the existence of each element of this iterable, and from there you can use it with different functionalities and options packed in the Counter Class, like
In [3]: x = Counter("Mohammed Ashour")
In [4]: x.most_common(3)
Out[4]: [('o', 2), ('h', 2), ('m', 2)]
Using the most_common
function and passing 3
to it makes the function returns the most common 3 elements in the counter, you also can get the least common using this trick
In [5]: x.most_common()[-3:]
Out[5]: [('s', 1), ('u', 1), ('r', 1)]
Counter
also supports operations between the counter objects, so you can add and subtract the count of the element of 2 objects
In [6]: a-b
Out[6]: Counter({'a': 1, 'b': 1, 'c': 1, 'd': 1})
In [7: b-a
Out[7]: Counter({'h': 1, 'i': 1, 'j': 1, 'k': 1})
In [8]: a+b
Out[8]:
Counter({'a': 1,
'b': 1,
'c': 1,
'd': 1,
'e': 2,
'f': 2,
'g': 2,
'h': 1,
'i': 1,
'j': 1,
'k': 1})
Chainmap
There are a lot of times that you have multiple dicts
that you need to search for a value inside them and you kinda try to merge them or search in them one by one, here comes the Chainmap
to the rescue
In [15]: from collections import ChainMap
In [16]: env1 = {"A":1, "B":2, "C":3}
In [17]: env2 = {"C":4, "D":5, "E":6}
In [18]: ChainMap(env1, env2)
Out[18]: ChainMap({'A': 1, 'B': 2, 'C': 3}, {'C': 4, 'D': 5, 'E': 6})
In [19]: cm = ChainMap(env1, env2)
In [20]: cm["A"]
Out[20]: 1
In [21]: cm["D"]
Out[21]: 5
In [22]: cm["C"]
Out[22]: 3
In [23]: cm.new_child({"F":7})
Out[23]: ChainMap({'F': 7}, {'A': 1, 'B': 2, 'C': 3}, {'C': 4, 'D': 5, 'E': 6})
In [24]: cm["F"]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-24-3c1bb143dc91> in <module>
----> 1 cm["F"]
/usr/lib/python3.8/collections/__init__.py in __getitem__(self, key)
896 except KeyError:
897 pass
--> 898 return self.__missing__(key) # support subclasses that define __missing__
899
900 def get(self, key, default=None):
/usr/lib/python3.8/collections/__init__.py in __missing__(self, key)
888
889 def __missing__(self, key):
--> 890 raise KeyError(key)
891
892 def __getitem__(self, key):
KeyError: 'F'
In [25]: cm
Out[25]: ChainMap({'A': 1, 'B': 2, 'C': 3}, {'C': 4, 'D': 5, 'E': 6})
In [26]: cm.update({"F":7})
In [27]: cm["F"]
Out[27]: 7
In [28]: cm
Out[28]: ChainMap({'A': 1, 'B': 2, 'C': 3, 'F': 7}, {'C': 4, 'D': 5, 'E': 6})
As you can see, I used multiple functions to make a showcase for you to see how the chain map work and how each function affects the object.
But when it can be of good use? Actually, there are a lot of great use cases for the ChainMap, and one of them is setting defaults
Imagine you have a config that has some of the default values, you want to use them until I override their values with other ones and I want to be able to add other configs to use, that's where it shines as easy to use without a lot of hassle.
deque
The deque object is basically the pre-defined representation of stack and queue in python
In [29]: from collections import deque
In [30]: x = [1,2,3,4,5,6,7]
In [31]: deque(x)
Out[31]: deque([1, 2, 3, 4, 5, 6, 7])
In [32]: q = deque(x)
In [33]: q.pop()
Out[33]: 7
In [34]: q
Out[34]: deque([1, 2, 3, 4, 5, 6])
In [35]: q.popleft()
Out[35]: 1
In [36]: q
Out[36]: deque([2, 3, 4, 5, 6])
In [37]: q.rotate(1)
In [38]: q
Out[38]: deque([6, 2, 3, 4, 5])
In [39]: q.insert(1,3)
In [40]: q
Out[40]: deque([6, 3, 2, 3, 4, 5])
In [41]: q.append(1)
In [42]: q
Out[42]: deque([6, 3, 2, 3, 4, 5, 1])
In [43]: q.appendleft(1)
In [44]: q
Out[44]: deque([1, 6, 3, 2, 3, 4, 5, 1])
as you can see, there are a lot of functionalities packed in the deque class like the append
ing and pop
ing from both sides, you can also insert
and rotate
the queue.
defaultdict
defaultdict
is a factory-type class which makes it very simple to construct a dict
with a standard structure like the following example
In [50]: from collections import defaultdict
In [51]: d = defaultdict(int)
In [52]: d
Out[52]: defaultdict(int, {})
In [53]: sen = "Hello All welcome to my blog"
In [54]: for char in sen:
...: d[char] += 1
...:
In [55]: d
Out[55]:
defaultdict(int,
{'H': 1,
'e': 3,
'l': 6,
'o': 4,
' ': 5,
'A': 1,
'w': 1,
'c': 1,
'm': 2,
't': 1,
'y': 1,
'b': 1,
'g': 1})
in this example, you can see that we used the fact that we initialized our dict
values to be int
to automatically increment the values without even init it or checking if it exists or not.
It's very handy and I use it in a lot of problem-solving situations.
The collections
module is a very useful one, I just introduced here a little of its options and built-ins, you can do a lot with it in a more Pythonic way, you can know more about it from the official doc: https://docs.python.org/3/library/collections.html
Subscribe to my newsletter
Read articles from Mohamed Ashour directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Mohamed Ashour
Mohamed Ashour
Spending most of my time writing code, designing architecture, and creating pipelines to generate data. And the rest of it learning new things to write about.