ETL Project for beginners
Simple ETL Web scrapping project where we extract data from a json, Transform the json data and Load it in one csv file
Loading JSON file
Transform JSON file
Load json file into CSV
Working json bank_data.json
details
Imports
import glob
import pandas as pd
Extract
We write a JSON extract function extract_from_json
to read data from JSON and store it in pandas dataframe
<script>alert(1)</script>
def extract_from_json(file_to_process):
dataframe = pd.read_json(file_to_process)
return dataframe
Extract Function
columns=['Name','Market Cap (US$ Billion)']
def extract():
extracted_data = pd.DataFrame(columns=columns)
for jsonfile in glob.glob("bank_data.json"):
extracted_data = extracted_data.append(extract_from_json(jsonfile), ignore_index=True)
return extracted_data
In the above code we have extracted data from json and adding columns for the pandas
Now we have a csv file with name exchange_rates.csv
in that we have a country code and Rates with comma seperated
,Rates
AUD,1.2970883369
BGN,1.6086527389
BRL,5.4091955914
CAD,1.2714262214
CHF,0.8860832374000001
CNY,6.4831386741
CZK,21.5101167955
DKK,6.1197565389
EUR,0.8225037013000001
GBP,0.7323984208000001
We will load exchange_rates.csv as a dataframe and find the exchange rate for British pounds with the symbol GBP, store it in the variable exchange_rate
.
df = pd.read_csv('exchange_rates.csv', index_col=[0])
for index,row in df.iterrows():
if(index == 'GBP'):
print('Rates', row.Rates)
exchange_rate = row.Rates
Transform
Using exchange_rate and the exchange_rates.csv file we will find the exchange rate of USD to GBP and will write a transform function that
Rounds the Market Cap (US$ Billion) column to 3 decimal places
Rename Market Cap (US$ Billion) to Market Cap (GBP$ Billion)
def transform(data):
data['Market Cap (GBP$ Billion)'] = round(data['Market Cap (GBP$ Billion)'] * exchange_rate, 3)
return data
Load
In this section we will load the transformed data into a new csv file names transformed_data.csv
def load(targetfile,data_to_load):
data_to_load.to_csv(targetfile)
Now the coding is done we will test is it working as expected or not
We will execute all the methods(extract, transform and load) to see the output
Extract code execution
extracted_data = extract()
extracted_data.head()
Transform code execution
extracted_data.rename(columns={'Market Cap (US$ Billion)': 'Market Cap (GBP$ Billion)'}, inplace = True)
transformed_data = transform(extracted_data)
transformed_data.head()
Load code execution
load('bank_market_cap_gbp.csv',transformed_data)
Thanks for reading, Have a happy learning :)
Subscribe to my newsletter
Read articles from krishna chaitanya directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by