Ask Questions: Understanding Your Data

When working with data, the first step to making sense of it is to ask the right questions. These questions guide your exploration and help you gather insights before diving into any analysis or modeling.

How Big is the Data?
Check the number of rows and columns to understand the dataset's scale using .shape().
How Does the Data Look Like?
Preview the data with .head() to get a sense of the columns and values.
What is the Data Type of Columns?
Use .dtypes() to identify data types and ensure they are suitable for the analysis.
Are There Any Missing Values?
Check for missing data with .isnull().sum() and decide how to handle them.
How Does the Data Look Mathematically?
Use .describe() to get summary statistics like mean, median, and standard deviation.
Are There Duplicate Values?
Identify duplicates with .duplicated().sum() and remove them if necessary.
How is the Correlation Between Columns?
Check relationships using .corr() and visualize correlations to guide feature selection.

Example: Data Exploration with Pandas

import pandas as pd
# Example: Assume we have a CSV file named 'data.csv'
df = pd.read_csv('data.csv')

# 1. How Big is the Data?
print("Data Shape (Rows, Columns):", df.shape)

# 2. How Does the Data Look Like? (View the first 5 rows)
print("\nFirst 5 Rows of the Data:")
print(df.head())

# 3. What is the Data Type of Columns?
print("\nData Types of Columns:")
print(df.dtypes)

# 4. Are There Any Missing Values?
print("\nMissing Values in Each Column:")
print(df.isnull().sum())

# 5. How Does the Data Look Mathematically? (Summary Statistics)
print("\nSummary Statistics of Numerical Columns:")
print(df.describe())

# 6. Are There Duplicate Values?
print("\nNumber of Duplicate Rows:")
print(df.duplicated().sum())

# 7. How is the Correlation Between Columns?
print("\nCorrelation Matrix:")
print(df.corr())

Day 15 - Understanding Your Data by Asking Questions

Table of contents

Example: Data Exploration with Pandas

Subscribe to my newsletter

Nischal Baidar

Nischal Baidar