Important functions in Data Analysis and Visualizations
1) Zip function
The "Zip" function in Python is a built-in function that combines elements from multiple iterable objects (such as lists, tuples, or strings) and returns an iterator that generates tuples containing elements from the input iterables. It stops when the shortest input iterable is exhausted.
Example -
names = ['Alice', 'Bob', 'Charlie']
ages = [25, 30, 22]
zipped_data = zip(names, ages)
for name, age in zipped_data:
print(f'{name} is {age} years old')
- Output -
Alice is 25 years old
Bob is 30 years old
Charlie is 22 years old
Note - You can also use the
zip
function to unzip data back into separate lists using the*
operator.You can use
zip
to create dictionaries from two lists.
pythonCopy codekeys = ['name', 'age']
values = ['Alice', 25]
data_dict = dict(zip(keys, values))
print(data_dict) # OUTPUT ---> {'name': 'Alice', 'age': 25}
2) Lambda function
- Lambda function is a concise and short way to create anonymous functions . Syntax-
lambda arguments: expression
- Example - To square a given number we will write a regular function and its equivalent lambda function
# Regular function
def square(x):
return x**2
# Equivalent lambda function
lambda_square = lambda x: x**2
print(square(5)) # Output: 25
print(lambda_square(5)) # Output: 25
- Note - Lambda functions are often used with
map()
andfilter()
functions to process sequences of data.
# MAP FUNCTION
numbers = [1, 2, 3, 4, 5]
squared_numbers = map(lambda x: x**2, numbers)
# Output: [1, 4, 9, 16, 25]
# FILTER FUNCTION
even_numbers = filter(lambda x: x % 2 == 0, numbers)
# Output: [2, 4]
3) Cumsum
cumsum
computes the cumulative sum of a column in a DataFrame. Example-
import pandas as pd
# Sample DataFrame
data = {'Values': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# Cumulative sum of the 'Values' column
df['Cumulative_Sum'] = df['Values'].cumsum()
print(df)
Output :
Values Cumulative_Sum
0 1 1
1 2 3
2 3 6
3 4 10
4 5 15
4) Cut
cut
is used for binning or discretization of continuous values into discrete intervals (bins).
# Binning 'Values' into three bins: Low, Medium, High
data = {'Values': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
bins = [0, 2, 4, 6]
labels = ['Low', 'Medium', 'High']
df['Category'] = pd.cut(df['Values'], bins=bins, labels=labels)
print(df)
Output-
Values Cumulative_Sum Category
0 1 1 Low
1 2 3 Medium
2 3 6 Medium
3 4 10 High
4 5 15 High
5) Qcut
qcut
is used for quantile-based binning. It divides the data into intervals with the same number of points.
# Quantile-based binning of 'Values' into three bins
data = {'Values': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
df['Quantile_Category'] = pd.qcut(df['Values'], q=3, labels=['Low', 'Medium', 'High'])
print(df)
#OUTPUT -
Values Cumulative_Sum Category Quantile_Category
0 1 1 Low Low
1 2 3 Medium Low
2 3 6 Medium Medium
3 4 10 High Medium
4 5 15 High High
6) Generator function
A generator function in Python is a special type of function that allows you to iterate over a potentially large sequence of data without generating the entire sequence in memory at once. It uses the
yield
keyword to produce a series of values over multiple calls, making it memory-efficient compared to generating a full list.Example-
def fibonacci_generator(limit): a, b = 0, 1 while a < limit: yield a a, b = b, a + b # Using the generator function fibonacci_limit = 20 fibonacci_gen = fibonacci_generator(fibonacci_limit) # Iterating through the generator for number in fibonacci_gen: print(number, end=' ')
Output: 0 1 1 2 3 5 8 13
Subscribe to my newsletter
Read articles from Prakhar directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Prakhar
Prakhar
Here to explore ,learn & share!