Date Manipulation with Pandas
Here's the task.
You need to generate dates between two ranges, extract the year, extract the month and extract the quarter number.
import pandas as pd
start = '1/1/2018'
end = '31/12/2020'
df = pd.DataFrame(columns=['Year', 'Month', 'Quarter']
Let's generate all the possible dates between start date and end date
dates = pd.date_range(start, end)
Now extract the year as
df.loc[:, 'Year'] = dates.to_period('Y')
Now extract the month as
df.loc[:, 'Month'] = dates.to_period('M').astype(str).str.replace('^\d*-', '', regex=True)
What's the reason for using regex?
Without the regex pandas would return quarter number as 2018-01
. So using regex remove the beginning digits along with the -
to get the month numbers.
Now extract the Quarter number as
df.loc[:, 'Quarter'] = dates.to_period('Q').astype(str).str.replace('^\d*', '', regex=True)
What the reason for using regex?
Without the regex pandas would return quarter number as 2018Q1
. So using regex remove the beginning digits to get the quarter numbers.
Subscribe to my newsletter
Read articles from Monojit Sarkar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Monojit Sarkar
Monojit Sarkar
I am a self-taught Python aficionado, dancing in the realms of AI and ML. What started as a curious exploration soon turned into a revelation: the unsung heroes behind the AI symphony are linear algebra, probability, and statistics. Astonishingly, these mathematical wizards not only power the algorithms but also surpass human problem-solving finesse.